[
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-228:
-------------------------------
Attachment: MAHOUT-228.patch
Updated patch.
This patch includes:
- ability to run and test logistic models from the mahout command line interface
- AUC computation
- algorithmic improvements
- ability to save and restore logistic regression models and input reading
parameters
- includes small sample data as resource for go/no-go testing of the compile
process and quickstart with classification.
Defects include:
- many copyright notices missing
- limited real-life testing
- missing several of Olivier's improvements
- no numerical or speed optimizations yet
- stuff
Near and medium-term plans include:
- test on some more realistic data
- throw away some defunct code
- first commit
- wiki page for quick-start
- magic knob tuning for learning parameters via evolutionary algorithms
Overall, this is getting close to useful for friendly users on non-critical
data.
> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
> Key: MAHOUT-228
> URL: https://issues.apache.org/jira/browse/MAHOUT-228
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Ted Dunning
> Fix For: 0.4
>
> Attachments: logP.csv, MAHOUT-228-3.patch, MAHOUT-228.patch,
> MAHOUT-228.patch, MAHOUT-228.patch, MAHOUT-228.patch, r.csv,
> sgd-derivation.pdf, sgd-derivation.tex, sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable
> learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a
> reasonable place to start.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.