[
https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750720#comment-15750720
]
Yanbo Liang commented on SPARK-18862:
-------------------------------------
Yeah, it seems like that, which I don't like as well. May be organize like
Python except for tree-based algorithms(which support both classification and
regression):
* mllib-classification.R
* mllib-regression.R
* mllib-clustering.R
* mllib-feature.R
* mllib-tree.R
We put tree-based algorithms in a separate file and let others follow the rules
in Python, what about this way? Thanks.
> Split SparkR mllib.R into multiple files
> ----------------------------------------
>
> Key: SPARK-18862
> URL: https://issues.apache.org/jira/browse/SPARK-18862
> Project: Spark
> Issue Type: Improvement
> Components: ML, SparkR
> Reporter: Yanbo Liang
>
> SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to
> split it into multiple files to make us easy to maintain:
> * mllibClassification.R
> * mllibRegression.R
> * mllibClustering.R
> * mllibFeature.R
> or:
> * mllib/classification.R
> * mllib/regression.R
> * mllib/clustering.R
> * mllib/features.R
> For R convention, it's more prefer the first way. And I'm not sure whether R
> supports the second organized way (will check later). Please let me know your
> preference. I think the start of a new release cycle is a good opportunity to
> do this, since it will involves less conflicts. If this proposal was
> approved, I can work on it.
> cc [~felixcheung] [~josephkb] [~mengxr]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]