[
https://issues.apache.org/jira/browse/MAHOUT-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169223#comment-13169223
]
[email protected] commented on MAHOUT-918:
------------------------------------------------------
bq. On 2011-12-13 13:24:28, Ted Dunning wrote:
bq. >
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java,
lines 36-41
bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64283#file64283line36>
bq. >
bq. > Direct and exact quotes from the paper should be either avoided or
acknowledged. Better here to rephrase the language.
Rephrased the language at revision 5.
bq. On 2011-12-13 13:24:28, Ted Dunning wrote:
bq. >
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java,
lines 60-63
bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64283#file64283line60>
bq. >
bq. > Again, just quoting the paper is not a good idea. This isn't adding
any information in any case since the exact same language was used in the class
level java doc.
bq. >
bq. > It would be nice here to note that the average is an *unweighted*
average.
Rephrased the language at revision 5.
bq. On 2011-12-13 13:24:28, Ted Dunning wrote:
bq. >
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java,
lines 87-88
bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64284#file64284line87>
bq. >
bq. > This looks like a bad key to use here.
This key should be the average of log-likelihood of the best
OnlineLogisticRegression in AdaptiveLogisticRegression.
bq. On 2011-12-13 13:24:28, Ted Dunning wrote:
bq. >
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java,
line 40
bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64284#file64284line40>
bq. >
bq. > I don't think that this is correct. Is this really what the output
is? Why are you dividing by a weight vector? How do you compute this score?
bq. >
bq. > Or do you mean to not divide here?
bq. >
bq. > If so, why do you use a score as the key?
The way to explain it may be bad, but it means the Map output key is score and
Map output value is new weight vector.
I rewrote the comment at revision 5.
bq. On 2011-12-13 13:24:28, Ted Dunning wrote:
bq. >
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionReducer.java,
lines 34-35
bq. > <https://reviews.apache.org/r/3072/diff/4/?file=64285#file64285line34>
bq. >
bq. > I don't think that this is correct. In the google paper, the
average was unweighted. In any case how do you compute this score for
weighting?
bq. >
bq. > Also, if the key is the score, how does the reducer work since each
reduce function will only see one score? Are you assuming that there is
exactly one reducer?
The original paper(http://aclweb.org/anthology-new/N/N10/N10-1069.pdf) says it
is a weighted average,
but my simple experiment showed that the unweighted average was better than the
weighted average.
So I rewrote the code as the unweighted average at revision 5.
The number of reducers should be set to one. I added the comment accordingly at
revision 5.
The number of reducers is set at runIteration function at Driver class.
- issei
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3072/#review3875
-----------------------------------------------------------
On 2011-12-14 08:59:29, issei yoshida wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3072/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-12-14 08:59:29)
bq.
bq.
bq. Review request for mahout.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. MAHOUT-918 Parallelized SGD in MapReduce
bq.
bq.
bq. This addresses bug MAHOUT-918.
bq. https://issues.apache.org/jira/browse/MAHOUT-918
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/PassiveAggressive.java
1214116
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionDriver.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapper.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionReducer.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionDriver.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionMapper.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionReducer.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveDriver.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveMapper.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveReducer.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDDriver.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDMapper.java
PRE-CREATION
bq.
trunk/core/src/main/java/org/apache/mahout/classifier/sgd/mapreduce/SGDReducer.java
PRE-CREATION
bq.
trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/AdaptiveLogisticRegressionMapReduceTest.java
PRE-CREATION
bq.
trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/LogisticRegressionMapReduceTest.java
PRE-CREATION
bq.
trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/PassiveAggressiveMapReduceTest.java
PRE-CREATION
bq.
trunk/core/src/test/java/org/apache/mahout/classifier/sgd/mapreduce/SGDMapReduceTest.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3072/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. issei
bq.
bq.
> Implement SGD based classifiers using MapReduce
> -----------------------------------------------
>
> Key: MAHOUT-918
> URL: https://issues.apache.org/jira/browse/MAHOUT-918
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Affects Versions: 0.6
> Reporter: issei yoshida
> Attachments: MAHOUT-918.patch, design.pdf
>
>
> Implement SGD based classifiers (Logistic Regression, Adaptive Logistic
> regression and Passive-Aggressive) using MapReduce.
> They are implemented using Iterative Parameter Mixtures algorithm which is
> referred to in the following papers.
> http://research.google.com/pubs/pub36948.html
> http://aclweb.org/anthology-new/N/N10/N10-1069.pdf
> http://books.nips.cc/papers/files/nips22/NIPS2009_0345.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira