[
https://issues.apache.org/jira/browse/SPARK-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562199#comment-14562199
]
Joseph K. Bradley edited comment on SPARK-7529 at 5/28/15 3:49 AM:
-------------------------------------------------------------------
*spark.mllib: Issues found in a pass through the spark.mllib package*
h3. Classification
LogisticRegressionModel + SVMModel
* scala.Option<Object> getThreshold() *--> SHOULD FIX: make Java version?*
NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels,
Iterable<Object> pi, Iterable<Iterable<Object>> theta) *--> SHOULD FIX*
** *TARGET 1.4*
h3. Clustering
DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>> topicDistributions() *--> SHOULD FIX:
make Java version?*
GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object> predict(RDD<Vector> points) *--> SHOULD FIX with Java version*
StreamingKMeans *--> SHOULD FIX with Java versions*
* DStream<Object> predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>>
predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K>
evidence$1)
h3. Evaluation
AreaUnderCurve *--> SHOULD FIX with Java versions*
* static double of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double of(RDD<scala.Tuple2<Object,Object>> curve)
BinaryClassificationMetrics *--> SHOULD FIX*
* LOTS (everything taking/returning an RDD)
** *TARGET 1.4?*
h3. Feature
Word2VecModel
* scala.Tuple2<String,Object>[] findSynonyms *--> SHOULD FIX with class to
replace tuple?*
** *TARGET 1.4?*
h3. Linalg
SparseMatrix
* static SparseMatrix fromCOO(int numRows, int numCols,
scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries) *-->
SHOULD FIX with Java version?*
Vectors
* static Vector sparse(int size,
scala.collection.Seq<scala.Tuple2<Object,Object>> elements) *--> SHOULD FIX
with Java version?*
BlockMatrix *--> SHOULD FIX with Java versions*
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
** _This issue appears in the constructors too._
h3. Optimization
_(lower priority b/c DeveloperApi which needs to be updated anyways)_
Optimizer
* Vector optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector
initialWeights)
* _Same issue appears elsewhere, wherever Double is used in a tuple._
Gradient
* scala.Tuple2<Vector,Object> compute(Vector data, double label, Vector
weights)
h3. Recommendation
MatrixFactorizationModel *--> SHOULD FIX with Java versions*
* _constructor_: MatrixFactorizationModel(int rank,
RDD<scala.Tuple2<Object,double[]>> userFeatures,
RDD<scala.Tuple2<Object,double[]>> productFeatures)
* RDD<scala.Tuple2<Object,double[]>> productFeatures()
* RDD<scala.Tuple2<Object,Rating[]>> recommendProductsForUsers(int num)
* RDD<scala.Tuple2<Object,Rating[]>> recommendUsersForProducts(int num)
* RDD<scala.Tuple2<Object,double[]>> userFeatures()
** *TARGET 1.4*
h3. Stats
Statistics *--> SHOULD FIX with Java versions*
* static double corr(RDD<Object> x, RDD<Object> y)
* static double corr(RDD<Object> x, RDD<Object> y, String method)
h3. Trees
DecisionTreeModel
* JavaRDD<Object> predict(JavaRDD<Vector> features) *--> SHOULD FIX*
** _This is because we use Double instead of java.lang.Double (unlike in, e.g.,
TreeEnsembleModel._
** *TARGET 1.4, or never fix since old API bug?*
Split _(low priority, but should fix with Java version)_
* scala.collection.immutable.List<Object> categories()
h3. util
DataValidators *--> SHOULD FIX with Java versions*
* static scala.Function1<RDD<LabeledPoint>,Object> binaryLabelValidator()
* static scala.Function1<RDD<LabeledPoint>,Object> multiLabelValidator(int
k)
was (Author: josephkb):
*spark.mllib: Issues found in a pass through the spark.mllib package*
h3. Classification
LogisticRegressionModel + SVMModel
* scala.Option<Object> getThreshold() *--> SHOULD FIX: make Java version?*
NaiveBayesModel
* "Java-friendly constructor": NaiveBayesModel(Iterable<Object> labels,
Iterable<Object> pi, Iterable<Iterable<Object>> theta) *--> SHOULD FIX*
h3. Clustering
DistributedLDAModel
* RDD<scala.Tuple2<Object,Vector>> topicDistributions() *--> SHOULD FIX:
make Java version?*
GaussianMixtureModel + KMeansModel + NaiveBayesModel
* RDD<Object> predict(RDD<Vector> points) *--> SHOULD FIX with Java version*
StreamingKMeans *--> SHOULD FIX with Java versions*
* DStream<Object> predictOn(DStream<Vector> data)
* <K> DStream<scala.Tuple2<K,Object>>
predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K>
evidence$1)
h3. Evaluation
AreaUnderCurve *--> SHOULD FIX with Java versions*
* static double of(scala.collection.Iterable<scala.Tuple2<Object,Object>> curve)
* static double of(RDD<scala.Tuple2<Object,Object>> curve)
BinaryClassificationMetrics *--> SHOULD FIX*
* LOTS (everything taking/returning an RDD)
h3. Feature
Word2VecModel
* scala.Tuple2<String,Object>[] findSynonyms *--> SHOULD FIX with class to
replace tuple?*
h3. Linalg
SparseMatrix
* static SparseMatrix fromCOO(int numRows, int numCols,
scala.collection.Iterable<scala.Tuple3<Object,Object,Object>> entries) *-->
SHOULD FIX with Java version?*
Vectors
* static Vector sparse(int size,
scala.collection.Seq<scala.Tuple2<Object,Object>> elements) *--> SHOULD FIX
with Java version?*
BlockMatrix *--> SHOULD FIX with Java versions*
* RDD<scala.Tuple2<scala.Tuple2<Object,Object>,Matrix>> blocks()
** _This issue appears in the constructors too._
h3. Optimization
_(lower priority b/c DeveloperApi which needs to be updated anyways)_
Optimizer
* Vector optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector
initialWeights)
* _Same issue appears elsewhere, wherever Double is used in a tuple._
Gradient
* scala.Tuple2<Vector,Object> compute(Vector data, double label, Vector
weights)
h3. Recommendation
MatrixFactorizationModel *--> SHOULD FIX with Java versions*
* _constructor_: MatrixFactorizationModel(int rank,
RDD<scala.Tuple2<Object,double[]>> userFeatures,
RDD<scala.Tuple2<Object,double[]>> productFeatures)
* RDD<scala.Tuple2<Object,double[]>> productFeatures()
* RDD<scala.Tuple2<Object,Rating[]>> recommendProductsForUsers(int num)
* RDD<scala.Tuple2<Object,Rating[]>> recommendUsersForProducts(int num)
* RDD<scala.Tuple2<Object,double[]>> userFeatures()
h3. Stats
Statistics *--> SHOULD FIX with Java versions*
* static double corr(RDD<Object> x, RDD<Object> y)
* static double corr(RDD<Object> x, RDD<Object> y, String method)
h3. Trees
DecisionTreeModel
* JavaRDD<Object> predict(JavaRDD<Vector> features) *--> SHOULD FIX*
** _This is because we use Double instead of java.lang.Double (unlike in, e.g.,
TreeEnsembleModel._
Split _(low priority, but should fix with Java version)_
* scala.collection.immutable.List<Object> categories()
h3. util
DataValidators *--> SHOULD FIX with Java versions*
* static scala.Function1<RDD<LabeledPoint>,Object> binaryLabelValidator()
* static scala.Function1<RDD<LabeledPoint>,Object> multiLabelValidator(int
k)
> Java compatibility check for MLlib 1.4
> --------------------------------------
>
> Key: SPARK-7529
> URL: https://issues.apache.org/jira/browse/SPARK-7529
> Project: Spark
> Issue Type: Sub-task
> Components: ML, MLlib
> Affects Versions: 1.4.0
> Reporter: Xiangrui Meng
> Assignee: Joseph K. Bradley
>
> Check Java compatibility for MLlib 1.4. We should create separate JIRAs for
> each possible issue.
> Checking compatibility means:
> * comparing with the Scala doc
> * verifying that Java docs are not messed up by Scala type incompatibilities
> (E.g., check for generic "Object" types where Java cannot understand complex
> Scala types. Also check Scala objects (especially with nesting!) carefully.
> * If needed for complex issues, create small Java unit tests which execute
> each method. (The correctness can be checked in Scala.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]