[ https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407985#comment-15407985 ]
ASF GitHub Bot commented on MAHOUT-1856: ---------------------------------------- GitHub user rawkintrevo opened a pull request: https://github.com/apache/mahout/pull/246 [MAHOUT-1856][WIP] reate a framework for new Mahout Clustering, Classification, and Optimization Algorithms Relevant JIRA: [https://issues.apache.org/jira/browse/MAHOUT-1856](https://issues.apache.org/jira/browse/MAHOUT-1856) Readme.md provides a more comprehensive (yet still incomplete) overview. Key Points: Top Level Class: Model has one method- fit, and coefs. Transformers map a vector input to a vector output (same or different length) Regressors map a vector input to a single output (e.g. a Double) Classifiers extend Transformers which have created a probability vector by 'selecting' the class and returning the label (instead of the entire p-vector) Pipelines and Ensembles are models as well, except they are composed from other models listed above, or from other pipelines and ensembles. ToDo: - [ ] All models need a uniform way to expose their tuning parameters -> this will be required for a auto-tuning algo. - [ ] Pipelines / Ensembles must be able to account and report the tunable paremeters of their sub models - [ ] Need fitness functions - [ ] Native method wrappers- Underlying engines and third party packages have implementations of many ML models, let's not recreate the wheel by exposing YET ANOTHER sgd algorithm. Instead should be able to convert matrix to expected format of 'other' library, run model, get results, package back into matrix and pass on in pipeline or ensemble. (This is especially useful for DeepLearning4J integration). Also Native implementations on engine of some algos probably more efficient by leveraging engine specific tricks (think Flink delta iterators) than implementations we would make. - [ ] Lots more, open for discussion. This is merely a conversation started on what to do. I've included OLS as an example regressor and a normalizer as an example transformer, only for illustrative purposes. I really don't want to pack to many algos in to this initial commit, just an example/ proof of concept so we can say, yea- this framework makes sense for this kind of model OR ooh, we probably want to have these features too. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rawkintrevo/mahout mahout-1856 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #246 ---- commit 6c0f6bd322a50341bcc587750146467f9ff3fa0a Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-01T00:08:16Z [MAHOUT-1856] ML Algo Framework commit 1f04cd5436df12ded23b8a1815b93ce73ea2a32a Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-02T17:22:48Z Building framework commit 33b90c9795bbb1ff381a98045b0d5f2b641693a9 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-02T23:09:30Z add placeholders for ensemble pipeline and fitness test commit 83c6068e2aa18a62f6ae8b84169a018f764ab408 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T14:54:32Z added readme commit 52e9c3e1df4db1397ab81bf07c0e191cfd229b1a Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T14:58:59Z fixed readme image commit 92ceeb9603ff9c4927214b896c4dbcfc63f8c7c4 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T15:04:11Z fixed readme image commit c0b0464f45470375d709ef9475d474440411879f Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T15:04:52Z fixed readme image commit 6f0228aa7ff349cd8ff5c10a4dafe55ec2037ee4 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-04T15:36:53Z removed autogen comments from files commit 065fb24068e5e98b24f4f53ab8cb312abfb8b9ed Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-01T00:08:16Z [MAHOUT-1856] ML Algo Framework commit 127d5dec29ac8b7d6ad3a12c494d4ccdae24cd31 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-02T17:22:48Z Building framework commit 557af2ee7bec17b176c6def768ea6d3da8495b42 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-02T23:09:30Z add placeholders for ensemble pipeline and fitness test commit bde4c940f3e540ffb2e8eceb87355638ca157f89 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T14:54:32Z added readme commit 565a164082b3c00294db2a4bd1a0b001d561d6f9 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T14:58:59Z fixed readme image commit 950027c047021c23f44af64b842bcbc1bbd717f9 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T15:04:11Z fixed readme image commit 045192146e290d9762f09e4235dd4c2f947891d4 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-03T15:04:52Z fixed readme image commit f65d7a941f666d0a58d56ac642558dd15fb57cd7 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-04T15:36:53Z removed autogen comments from files commit 842db7ec3c21e5a4d1d152f1150b0dc97e5f44e7 Author: rawkintrevo <trevor.d.gr...@gmail.com> Date: 2016-08-04T15:38:19Z Merge branch 'mahout-1856' of https://github.com/rawkintrevo/mahout into mahout-1856 ---- > Create a framework for new Mahout Clustering, Classification, and > Optimization Algorithms > ------------------------------------------------------------------------------------------ > > Key: MAHOUT-1856 > URL: https://issues.apache.org/jira/browse/MAHOUT-1856 > Project: Mahout > Issue Type: New Feature > Affects Versions: 0.12.1 > Reporter: Andrew Palumbo > Assignee: Trevor Grant > Priority: Critical > Fix For: 0.13.0 > > > To ensure that Mahout does not become "A loose bag of algorithms", Create > basic traits with funtions common to each class of algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)