Author: apalumbo
Date: Sun Mar 29 18:54:00 2015
New Revision: 1669950
URL: http://svn.apache.org/r1669950
Log:
add references to bank marketing dataset and Frank's blog on SGD page.
Modified:
mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext?rev=1669950&r1=1669949&r2=1669950&view=diff
==============================================================================
---
mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
(original)
+++
mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
Sun Mar 29 18:54:00 2015
@@ -13,10 +13,14 @@ The Mahout implementation uses Stochasti
large training sets to be used.
For a more detailed analysis of the approach, have a look at the [thesis of
-Paul
Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en).
+Paul
Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en)
[1].
See MAHOUT-228 for the main JIRA issue for SGD.
+A more detailed overview of the Mahout Linear Regression classifier and
[detailed discription of building a Logistic Regression
classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/)
for the classic [Iris flower
dataset](http://en.wikipedia.org/wiki/Iris_flower_data_set) is also available
[2].
+
+An example of using training a Logistic Regression classifier for the [UCI
Bank Marketing Dataset](http://mlr.cs.umass.edu/ml/datasets/Bank+Marketing) can
be found [on the Mahout
website](http://mahout.apache.org/users/classification/bankmarketing-example.html)
[3].
+
<a name="LogisticRegression-Parallelizationstrategy"></a>
## Parallelization strategy
@@ -53,7 +57,7 @@ include
* The evolutionary optimization system (found in org.apache.mahout.ep)
<a name="LogisticRegression-Featurevectorencoding"></a>
-### Feature vector encoding
+## Feature vector encoding
Because the SGD algorithms need to have fixed length feature vectors and
because it is a pain to build a dictionary ahead of time, most SGD
@@ -78,7 +82,7 @@ Here is a class diagram for the encoders

<a name="LogisticRegression-SGDLearning"></a>
-### SGD Learning
+## SGD Learning
For the simplest applications, you can construct an
OnlineLogisticRegression and be off and running. Typically, though, it is
@@ -104,3 +108,15 @@ TrainNewsGroups example code.

+## References
+
+[1] [Thesis of
+Paul
Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en)
+
+[2] [An Introduction To Mahout's Logistic Regression SGD
Classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/)
+
+## Examples
+
+[3] [SGD Bank Marketing
Example](http://mahout.apache.org/users/classification/bankmarketing-example.html)
+
+