Author: apalumbo
Date: Tue Jun 24 14:35:19 2014
New Revision: 1605096
URL: http://svn.apache.org/r1605096
Log:
MAHOUT-1587: Update website to reflect move to GitHub
Modified:
mahout/site/mahout_cms/trunk/content/users/classification/bayesian.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/classification/bayesian.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/bayesian.mdtext?rev=1605096&r1=1605095&r2=1605096&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/classification/bayesian.mdtext
(original)
+++ mahout/site/mahout_cms/trunk/content/users/classification/bayesian.mdtext
Tue Jun 24 14:35:19 2014
@@ -37,7 +37,7 @@ As we can see, the main difference betwe
### Running from the command line
-Mahout provides CLI drivers for all above steps. Here we will give a simple
overview of Mahout CLI commands used to preprocess the data, train the model
and assign labels to the training set. An [example
script](https://svn.apache.org/repos/asf/mahout/trunk/examples/bin/classify-20newsgroups.sh)
is given for the full process from data acquisition through classification of
the classic [20 Newsgroups
corpus](https://mahout.apache.org/users/classification/twenty-newsgroups.html).
+Mahout provides CLI drivers for all above steps. Here we will give a simple
overview of Mahout CLI commands used to preprocess the data, train the model
and assign labels to the training set. An [example
script](https://github.com/apache/mahout/blob/master/examples/bin/classify-20newsgroups.sh)
is given for the full process from data acquisition through classification of
the classic [20 Newsgroups
corpus](https://mahout.apache.org/users/classification/twenty-newsgroups.html).
- **Preprocessing:**
For a set of Sequence File Formatted documents in PATH_TO_SEQUENCE_FILES the
[mahout
seq2sparse](https://mahout.apache.org/users/basics/creating-vectors-from-text.html)
command performs the TF-IDF transformations (-wt tfidf option) and L2 length
normalization (-n 2 option) as follows: