[GitHub] spark pull request #21492: [SPARK-24300][ML] change the way to set seed in m...

2018-06-04 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21492 [SPARK-24300][ML] change the way to set seed in ml.cluster.LDASuite.generateLDAData ## What changes were proposed in this pull request? Using different RNG in all different

[GitHub] spark issue #21340: [SPARK-24115] Have logging pass through instrumentation ...

2018-05-17 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/21340 Thanks for the PR. LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-16 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/21344 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21347: [SPARK-24290][ML] add support for Array input for...

2018-05-16 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21347 [SPARK-24290][ML] add support for Array input for instrumentation.logNamedValue ## What changes were proposed in this pull request? Extend instrumentation.logNamedValue to support

[GitHub] spark pull request #21335: [SPARK-24231][PYSPARK][ML] Provide Python API for...

2018-05-15 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21335 [SPARK-24231][PYSPARK][ML] Provide Python API for evaluateEachIteration for spark.ml GBTs ## What changes were proposed in this pull request? Add evaluateEachIteration

[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-14 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/21183 I tested to load the old saving models from Spark 2.3. It is ok to load it from this. For the tests in LDASuite, I do see failing sometimes without this fix. It will not always

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-09 Thread ludatabricks
Github user ludatabricks commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r187144226 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark pull request #21195: [Spark-23975][ML] Add support of array input for ...

2018-05-07 Thread ludatabricks
Github user ludatabricks commented on a diff in the pull request: https://github.com/apache/spark/pull/21195#discussion_r186566521 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -323,4 +324,21 @@ class LDASuite extends SparkFunSuite

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread ludatabricks
Github user ludatabricks commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185894432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -423,6 +423,8 @@ class GaussianMixture @Since("

[GitHub] spark issue #21204: [SPARK-24132][ML] Instrumentation improvement for classi...

2018-05-03 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/21204 LGTM Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-03 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/13493 LGTM retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrument improvements for clu...

2018-05-02 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21218 [SPARK-24155][ML] Instrument improvements for clustering ## What changes were proposed in this pull request? changed the instrument for all of the clustering methods ## How

[GitHub] spark pull request #21204: [SPARK-24132][ML]Expand instrumentation for class...

2018-05-01 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21204 [SPARK-24132][ML]Expand instrumentation for classification ## What changes were proposed in this pull request? - Add OptionalInstrumentation as argument for getNumClasses

[GitHub] spark pull request #21195: [Spark 23975][ML] Add support of array input for ...

2018-04-30 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21195 [Spark 23975][ML] Add support of array input for all clustering methods ## What changes were proposed in this pull request? Add support for all of the clustering methods

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-04-27 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21183 [SPARK-22210][ML] Add seed for LDA variationalTopicInference ## What changes were proposed in this pull request? - Add seed parameter for variationalTopicInference - Add seed

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-04-24 Thread ludatabricks
Github user ludatabricks commented on the issue: https://github.com/apache/spark/pull/13493 The bug is confirmed. The fix looks pretty reasonable to me. ping @jkbradley . --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21081: [SPARK-23975][ML]Allow Clustering to take Arrays ...

2018-04-16 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21081 [SPARK-23975][ML]Allow Clustering to take Arrays of Double as input features ## What changes were proposed in this pull request? - Multiple possible input types is added

[GitHub] spark pull request #21044: Add RawPrediction, numClasses, and numFeatures fo...

2018-04-11 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21044 Add RawPrediction, numClasses, and numFeatures for OneVsRestModel add RawPrediction as output column add numClasses and numFeatures to OneVsRestModel ## What changes were

[GitHub] spark pull request #21015: [SPARK-23944][ML] Add the set method for the two ...

2018-04-09 Thread ludatabricks
GitHub user ludatabricks opened a pull request: https://github.com/apache/spark/pull/21015 [SPARK-23944][ML] Add the set method for the two LSHModel ## What changes were proposed in this pull request? Add two set method for LSHModel in LSH.scala