spark git commit: [SPARK-23291][SPARK-23291][R][FOLLOWUP] Update SparkR migration note for

2018-05-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 56a52e0a5 -> 1c9c5de95 [SPARK-23291][SPARK-23291][R][FOLLOWUP] Update SparkR migration note for ## What changes were proposed in this pull request? This PR fixes the migration note for SPARK-23291 since it's going to backport to 2.3.1.

spark git commit: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starting position by 1 when calling Scala API

2018-05-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.3 f87785a76 -> 3a22feab4 [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starting position by 1 when calling Scala API ## What changes were proposed in this pull request? This PR backports

spark git commit: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of dataframe vectorized summarizer

2017-12-20 Thread yliang
Repository: spark Updated Branches: refs/heads/master 9c289a5cb -> d3ae3e1e8 [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of dataframe vectorized summarizer ## What changes were proposed in this pull request? Make several improvements in dataframe vectorized summarizer. 1. Make the

spark git commit: [SPARK-22810][ML][PYSPARK] Expose Python API for LinearRegression with huber loss.

2017-12-20 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0114c89d0 -> fb0562f34 [SPARK-22810][ML][PYSPARK] Expose Python API for LinearRegression with huber loss. ## What changes were proposed in this pull request? Expose Python API for _LinearRegression_ with _huber_ loss. ## How was this

spark git commit: [SPARK-3181][ML] Implement huber loss for LinearRegression.

2017-12-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2a29a60da -> 1e44dd004 [SPARK-3181][ML] Implement huber loss for LinearRegression. ## What changes were proposed in this pull request? MLlib ```LinearRegression``` supports _huber_ loss addition to _leastSquares_ loss. The huber loss

spark git commit: [SPARK-21087][ML][FOLLOWUP] Sync SharedParamsCodeGen and sharedParams.

2017-12-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master 17cdabb88 -> b03af8b58 [SPARK-21087][ML][FOLLOWUP] Sync SharedParamsCodeGen and sharedParams. ## What changes were proposed in this pull request? #19208 modified ```sharedParams.scala```, but didn't generated by

spark git commit: [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound)

2017-12-12 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 9e2d96d1d -> 00cdb38dc [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound) ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-22289 add JSON

spark git commit: [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound)

2017-12-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master e6dc5f280 -> 10c27a655 [SPARK-22289][ML] Add JSON support for Matrix parameters (LR with coefficients bound) ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-22289 add JSON

spark git commit: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvaluatorSuite test data to data/mllib.

2017-11-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7475a9655 -> 3da3d7635 [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvaluatorSuite test data to data/mllib. ## What changes were proposed in this pull request? Move ```ClusteringEvaluatorSuite``` test data(iris) to data/mllib, to prevent

spark git commit: [SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator

2017-09-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master fedf6961b -> 5ac96854c [SPARK-21981][PYTHON][ML] Added Python interface for ClusteringEvaluator ## What changes were proposed in this pull request? Added Python interface for ClusteringEvaluator ## How was this patch tested? Manual

spark git commit: [MINOR][ML] Remove unnecessary default value setting for evaluators.

2017-09-19 Thread yliang
Repository: spark Updated Branches: refs/heads/master 8319432af -> 2f962422a [MINOR][ML] Remove unnecessary default value setting for evaluators. ## What changes were proposed in this pull request? Remove unnecessary default value setting for all evaluators, as we have set them in

spark git commit: [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.

2017-09-14 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 3a692e355 -> 51e5a821d [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest. ## What changes were proposed in this pull request? #19197 fixed double caching for MLlib algorithms, but missed PySpark ```OneVsRest```,

spark git commit: [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.

2017-09-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 66cb72d7b -> c76153cc7 [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest. ## What changes were proposed in this pull request? #19197 fixed double caching for MLlib algorithms, but missed PySpark ```OneVsRest```, this PR

spark git commit: [MINOR][DOC] Add missing call of `update()` in examples of PeriodicGraphCheckpointer & PeriodicRDDCheckpointer

2017-09-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 8d8641f12 -> 66cb72d7b [MINOR][DOC] Add missing call of `update()` in examples of PeriodicGraphCheckpointer & PeriodicRDDCheckpointer ## What changes were proposed in this pull request? forgot to call `update()` with `graph1` & `rdd1` in

spark git commit: [SPARK-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API

2017-09-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master dcbb22943 -> 8d8641f12 [SPARK-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API ## What changes were proposed in this pull request? Added LogisticRegressionTrainingSummary for

spark git commit: [SPARK-21690][ML] one-pass imputer

2017-09-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master ca00cc70d -> 0fa5b7cac [SPARK-21690][ML] one-pass imputer ## What changes were proposed in this pull request? parallelize the computation of all columns performance tests: |numColums| Mean(Old) | Median(Old) | Mean(RDD) | Median(RDD) |

spark git commit: [SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette.

2017-09-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master e2ac2f1c7 -> dd7816758 [SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette. ## What changes were proposed in this pull request? This PR adds the ClusteringEvaluator

spark git commit: [SPARK-21856] Add probability and rawPrediction to MLPC for Python

2017-09-11 Thread yliang
Repository: spark Updated Branches: refs/heads/master 828fab035 -> 4bab8f599 [SPARK-21856] Add probability and rawPrediction to MLPC for Python Probability and rawPrediction has been added to MultilayerPerceptronClassifier for Python Add unit test. Author: Chunsheng Ji

spark git commit: [SPARK-21108][ML] convert LinearSVC to aggregator framework

2017-08-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 05af2de0f -> f3676d639 [SPARK-21108][ML] convert LinearSVC to aggregator framework ## What changes were proposed in this pull request? convert LinearSVC to new aggregator framework ## How was this patch tested? existing unit test.

spark git commit: [ML][MINOR] Make sharedParams update.

2017-08-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3c0c2d09c -> 342961905 [ML][MINOR] Make sharedParams update. ## What changes were proposed in this pull request? ```sharedParams.scala``` was generated by ```SharedParamsCodeGen```, but it's not updated in master. Maybe someone manual

spark git commit: [SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.

2017-08-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master 84b5b16ea -> c108a5d30 [SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization. ## What changes were proposed in this pull request? MLlib ```LinearRegression/LogisticRegression/LinearSVC``` always standardize the data

spark git commit: [SPARK-19634][ML] Multivariate summarizer - dataframes API

2017-08-15 Thread yliang
Repository: spark Updated Branches: refs/heads/master 966083105 -> 07549b20a [SPARK-19634][ML] Multivariate summarizer - dataframes API ## What changes were proposed in this pull request? This patch adds the DataFrames API to the multivariate summarizer (mean, variance, etc.). In addition

spark git commit: [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search

2017-08-09 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 d02331452 -> 7446be332 [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe

spark git commit: [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search

2017-08-09 Thread yliang
Repository: spark Updated Branches: refs/heads/master ae8a2b149 -> b35660dd0 [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line

spark git commit: [SPARK-21306][ML] For branch 2.0, OneVsRest should support setWeightCol

2017-08-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 c27a01aec -> 9f670ce5d [SPARK-21306][ML] For branch 2.0, OneVsRest should support setWeightCol The PR is related to #18554, and is modified for branch 2.0. ## What changes were proposed in this pull request? add `setWeightCol` method

spark git commit: [SPARK-21306][ML] For branch 2.1, OneVsRest should support setWeightCol

2017-08-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 444cca14d -> 9b749b6ce [SPARK-21306][ML] For branch 2.1, OneVsRest should support setWeightCol The PR is related to #18554, and is modified for branch 2.1. ## What changes were proposed in this pull request? add `setWeightCol` method

spark git commit: [SPARK-19270][FOLLOW-UP][ML] PySpark GLR model.summary should return a printable representation.

2017-08-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master fdcee028a -> f763d8464 [SPARK-19270][FOLLOW-UP][ML] PySpark GLR model.summary should return a printable representation. ## What changes were proposed in this pull request? PySpark GLR ```model.summary``` should return a printable

spark git commit: [SPARK-20601][ML] Python API for Constrained Logistic Regression

2017-08-02 Thread yliang
Repository: spark Updated Branches: refs/heads/master 14e75758a -> 845c039ce [SPARK-20601][ML] Python API for Constrained Logistic Regression ## What changes were proposed in this pull request? Python API for Constrained Logistic Regression based on #17922 , thanks for the original

spark git commit: [SPARK-21388][ML][PYSPARK] GBTs inherit from HasStepSize & LInearSVC from HasThreshold

2017-08-01 Thread yliang
Repository: spark Updated Branches: refs/heads/master 5fd0294ff -> 253a07e43 [SPARK-21388][ML][PYSPARK] GBTs inherit from HasStepSize & LInearSVC from HasThreshold ## What changes were proposed in this pull request? GBTs inherit from HasStepSize & LInearSVC/Binarizer from HasThreshold ##

spark git commit: [SPARK-21575][SPARKR] Eliminate needless synchronization in java-R serialization

2017-07-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 44e501ace -> 106eaa9b9 [SPARK-21575][SPARKR] Eliminate needless synchronization in java-R serialization ## What changes were proposed in this pull request? Remove surplus synchronized blocks. ## How was this patch tested? Unit tests run

spark git commit: Revert "[SPARK-21306][ML] OneVsRest should support setWeightCol"

2017-07-28 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 8520d7c6d -> 258ca40cf Revert "[SPARK-21306][ML] OneVsRest should support setWeightCol" This reverts commit 8520d7c6d5e880dea3c1a8a874148c07222b4b4b. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Revert "[SPARK-21306][ML] OneVsRest should support setWeightCol"

2017-07-28 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 ccb827224 -> f8ae2bdd2 Revert "[SPARK-21306][ML] OneVsRest should support setWeightCol" This reverts commit ccb82722450c20c9cdea2b2c68783943213a5aa1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-27 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 d7b9d6235 -> ccb827224 [SPARK-21306][ML] OneVsRest should support setWeightCol ## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit

spark git commit: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-27 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 94987987a -> 8520d7c6d [SPARK-21306][ML] OneVsRest should support setWeightCol ## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit

spark git commit: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-27 Thread yliang
Repository: spark Updated Branches: refs/heads/master f44ead89f -> a5a318997 [SPARK-21306][ML] OneVsRest should support setWeightCol ## What changes were proposed in this pull request? add `setWeightCol` method for OneVsRest. `weightCol` is ignored if classifier doesn't inherit HasWeightCol

spark git commit: [SPARK-19270][ML] Add summary table to GLM summary

2017-07-27 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2ff35a057 -> ddcd2e826 [SPARK-19270][ML] Add summary table to GLM summary ## What changes were proposed in this pull request? Add R-like summary table to GLM summary, which includes feature name (if exist), parameter estimate, standard

spark git commit: [MINOR][ML] Reorg RFormula params.

2017-07-20 Thread yliang
Repository: spark Updated Branches: refs/heads/master 256358f66 -> 5d1850d4b [MINOR][ML] Reorg RFormula params. ## What changes were proposed in this pull request? There are mainly two reasons for this reorg: * Some params are placed in ```RFormulaBase```, while others are placed in

spark git commit: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both features and label column.

2017-07-15 Thread yliang
Repository: spark Updated Branches: refs/heads/master 74ac1fb08 -> 69e5282d3 [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both features and label column. ## What changes were proposed in this pull request? ```RFormula``` should handle invalid for both features and

spark git commit: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid

2017-07-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master aaad34dc2 -> d2d2a5de1 [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/StringIndexer/RFormula inherit from HasHandleInvalid ## What changes were proposed in this pull request? 1, HasHandleInvaild support override 2, Make

spark git commit: [SPARK-21285][ML] VectorAssembler reports the column name of unsupported data type

2017-07-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7fcbb9b57 -> 56536e999 [SPARK-21285][ML] VectorAssembler reports the column name of unsupported data type ## What changes were proposed in this pull request? add the column name in the exception which is raised by unsupported data type.

spark git commit: [SPARK-21310][ML][PYSPARK] Expose offset in PySpark

2017-07-05 Thread yliang
Repository: spark Updated Branches: refs/heads/master a38643256 -> 4852b7d44 [SPARK-21310][ML][PYSPARK] Expose offset in PySpark ## What changes were proposed in this pull request? Add offset to PySpark in GLM as in #16699. ## How was this patch tested? Python test Author: actuaryzhang

spark git commit: [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data

2017-07-02 Thread yliang
Repository: spark Updated Branches: refs/heads/master c605fee01 -> c19680be1 [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data ## What changes were proposed in this pull request? This PR is to maintain API parity with changes made in SPARK-17498 to

spark git commit: [SPARK-18518][ML] HasSolver supports override

2017-07-01 Thread yliang
Repository: spark Updated Branches: refs/heads/master 37ef32e51 -> e0b047eaf [SPARK-18518][ML] HasSolver supports override ## What changes were proposed in this pull request? 1, make param support non-final with `finalFields` option 2, generate `HasSolver` with `finalFields = false` 3,

spark git commit: [SPARK-21275][ML] Update GLM test to use supportedFamilyNames

2017-07-01 Thread yliang
Repository: spark Updated Branches: refs/heads/master b1d719e7c -> 37ef32e51 [SPARK-21275][ML] Update GLM test to use supportedFamilyNames ## What changes were proposed in this pull request? Update GLM test to use supportedFamilyNames as suggested here:

spark git commit: [ML] Fix scala-2.10 build failure of GeneralizedLinearRegressionSuite.

2017-06-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3c2fc19d4 -> 528c9281a [ML] Fix scala-2.10 build failure of GeneralizedLinearRegressionSuite. ## What changes were proposed in this pull request? Fix scala-2.10 build failure of ```GeneralizedLinearRegressionSuite```. ## How was this

spark git commit: [SPARK-18710][ML] Add offset in GLM

2017-06-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 52981715b -> 49d767d83 [SPARK-18710][ML] Add offset in GLM ## What changes were proposed in this pull request? Add support for offset in GLM. This is useful for at least two reasons: 1. Account for exposure: e.g., when modeling the number

spark git commit: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference category when encoding string terms

2017-06-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master 376d90d55 -> 0c8444cf6 [SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference category when encoding string terms ## What changes were proposed in this pull request? Please see

spark git commit: [SPARK-20899][PYSPARK] PySpark supports stringIndexerOrderType in RFormula

2017-05-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 35b644bd0 -> ff5676b01 [SPARK-20899][PYSPARK] PySpark supports stringIndexerOrderType in RFormula ## What changes were proposed in this pull request? PySpark supports stringIndexerOrderType in RFormula as in #17967. ## How was this patch

spark git commit: [SPARK-14659][ML] RFormula consistent with R when handling strings

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2dbe0c528 -> f47700c9c [SPARK-14659][ML] RFormula consistent with R when handling strings ## What changes were proposed in this pull request? When handling strings, the category dropped by RFormula and R are different: - RFormula drops the

spark git commit: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 9cbf39f1c -> e01f1f222 [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark FPGrowth. ## How was this

spark git commit: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 913a6bfe4 -> 139da116f [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark FPGrowth. ## How was this

spark git commit: [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 8896c4ee9 -> 9cbf39f1c [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth. ## What changes were proposed in this pull request? Follow-up for #17218, some minor fix for PySpark ```FPGrowth```. ## How was this patch tested?

spark git commit: [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth.

2017-05-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3f94e64aa -> 913a6bfe4 [SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth. ## What changes were proposed in this pull request? Follow-up for #17218, some minor fix for PySpark ```FPGrowth```. ## How was this patch tested? Existing

spark git commit: [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 4dd34d004 -> 72e1f83d7 [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel ## What changes were proposed in this pull request? Fixed TypeError with python3 and numpy 1.12.1. Numpy's

spark git commit: [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 f4538c95f -> 13adc0fc0 [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel ## What changes were proposed in this pull request? Fixed TypeError with python3 and numpy 1.12.1. Numpy's

spark git commit: [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 1d107242f -> 83aeac9e0 [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel ## What changes were proposed in this pull request? Fixed TypeError with python3 and numpy 1.12.1. Numpy's

spark git commit: [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1816eb3be -> bc66a77bb [SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel ## What changes were proposed in this pull request? Fixed TypeError with python3 and numpy 1.12.1. Numpy's `reshape` no

spark git commit: [SPARK-20631][FOLLOW-UP] Fix incorrect tests.

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 e936a96ba -> 1d107242f [SPARK-20631][FOLLOW-UP] Fix incorrect tests. ## What changes were proposed in this pull request? - Fix incorrect tests for `_check_thresholds`. - Move test to `ParamTests`. ## How was this patch tested? Unit

spark git commit: [SPARK-20631][FOLLOW-UP] Fix incorrect tests.

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 9afcf127d -> 1816eb3be [SPARK-20631][FOLLOW-UP] Fix incorrect tests. ## What changes were proposed in this pull request? - Fix incorrect tests for `_check_thresholds`. - Move test to `ParamTests`. ## How was this patch tested? Unit

spark git commit: [SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 ee9d5975e -> e936a96ba [SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? Add test cases for

spark git commit: [SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master d76633e3c -> 9afcf127d [SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? Add test cases for PR-18062

spark git commit: [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary.

2017-05-23 Thread yliang
Repository: spark Updated Branches: refs/heads/master 442287ae2 -> ad09e4ca0 [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary. ## What changes were proposed in this pull request? Joint coefficients with intercept for SparkR linear SVM summary. ## How was

spark git commit: [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary.

2017-05-23 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 06c985c1b -> dbb068f4f [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary. ## What changes were proposed in this pull request? Joint coefficients with intercept for SparkR linear SVM summary. ## How

spark git commit: [SPARK-20764][ML][PYSPARK] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-22 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 a57553279 -> a0bf5c47c [SPARK-20764][ML][PYSPARK] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? SPARK-20097 exposed

spark git commit: [SPARK-20764][ML][PYSPARK] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version

2017-05-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master f3ed62a38 -> cfca01136 [SPARK-20764][ML][PYSPARK] Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR - Python version ## What changes were proposed in this pull request? SPARK-20097 exposed degreesOfFreedom

spark git commit: [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and ml.stat.ChiSquareTest.

2017-05-17 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 b8fa79cec -> ba0117c27 [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and ml.stat.ChiSquareTest. ## What changes were proposed in this pull request? Add docs and examples for ```ml.stat.Correlation``` and

spark git commit: [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and ml.stat.ChiSquareTest.

2017-05-17 Thread yliang
Repository: spark Updated Branches: refs/heads/master 324a904d8 -> 697a5e551 [SPARK-20505][ML] Add docs and examples for ml.stat.Correlation and ml.stat.ChiSquareTest. ## What changes were proposed in this pull request? Add docs and examples for ```ml.stat.Correlation``` and

spark git commit: [SPARK-20707][ML] ML deprecated APIs should be removed in major release.

2017-05-15 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 10e599f69 -> a869e8bfd [SPARK-20707][ML] ML deprecated APIs should be removed in major release. ## What changes were proposed in this pull request? Before 2.2, MLlib keep to remove APIs deprecated in last feature/minor release. But

spark git commit: [SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive

2017-05-15 Thread yliang
Repository: spark Updated Branches: refs/heads/master b0888d1ac -> 9970aa096 [SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch

spark git commit: [SPARK-20606][ML] Revert "[] ML 2.2 QA: Remove deprecated methods for ML"

2017-05-11 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 3eb0ee06a -> 80a57fa90 [SPARK-20606][ML] Revert "[] ML 2.2 QA: Remove deprecated methods for ML" This reverts commit b8733e0ad9f5a700f385e210450fd2c10137293e. Author: Yanbo Liang Closes #17944 from

spark git commit: [SPARK-20606][ML] Revert "[] ML 2.2 QA: Remove deprecated methods for ML"

2017-05-11 Thread yliang
Repository: spark Updated Branches: refs/heads/master 8ddbc431d -> 0698e6c88 [SPARK-20606][ML] Revert "[] ML 2.2 QA: Remove deprecated methods for ML" This reverts commit b8733e0ad9f5a700f385e210450fd2c10137293e. Author: Yanbo Liang Closes #17944 from

spark git commit: [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params

2017-05-10 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 46659974e -> d86dae8fe [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params ## What changes were proposed in this pull request? - Replace `getParam` calls with `getOrDefault` calls. -

spark git commit: [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params

2017-05-10 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0ef16bd4b -> 804949c6b [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params ## What changes were proposed in this pull request? - Replace `getParam` calls with `getOrDefault` calls. - Fix

spark git commit: [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params

2017-05-10 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 8e097890a -> 69786ea3a [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params ## What changes were proposed in this pull request? - Replace `getParam` calls with `getOrDefault` calls. -

spark git commit: [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params

2017-05-10 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 ef50a9548 -> 3ed2f4d51 [SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsistency should use values not Params ## What changes were proposed in this pull request? - Replace `getParam` calls with `getOrDefault` calls. -

spark git commit: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML

2017-05-09 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 4bbfad44e -> 4b7aa0b1d [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove ML methods we deprecated in 2.1. ## How was this patch tested? Existing tests. Author:

spark git commit: [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML

2017-05-09 Thread yliang
Repository: spark Updated Branches: refs/heads/master be53a7835 -> b8733e0ad [SPARK-20606][ML] ML 2.2 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove ML methods we deprecated in 2.1. ## How was this patch tested? Existing tests. Author: Yanbo

spark git commit: [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column

2017-05-04 Thread yliang
Repository: spark Updated Branches: refs/heads/master bfc8c79c8 -> 0d16faab9 [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column ## What changes were proposed in this pull request? Bucketizer currently requires input column to be Double, but the logic should work on any

spark git commit: [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column

2017-05-04 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 425ed26d2 -> c8756288d [SPARK-20574][ML] Allow Bucketizer to handle non-Double numeric column ## What changes were proposed in this pull request? Bucketizer currently requires input column to be Double, but the logic should work on

spark git commit: [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up

2017-05-04 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 b6727795f -> 425ed26d2 [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up ## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under

spark git commit: [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up

2017-05-04 Thread yliang
Repository: spark Updated Branches: refs/heads/master 57b64703e -> c5dceb8c6 [SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up ## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

2017-04-26 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 612952251 -> 34dec68d7 [MINOR][ML] Fix some PySpark & SparkR flaky tests ## What changes were proposed in this pull request? Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged.

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

2017-04-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7fecf5130 -> dbb06c689 [MINOR][ML] Fix some PySpark & SparkR flaky tests ## What changes were proposed in this pull request? Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I

spark git commit: [SPARK-18901][FOLLOWUP][ML] Require in LR LogisticAggregator is redundant

2017-04-25 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 b62ebd91b -> e2591c6d7 [SPARK-18901][FOLLOWUP][ML] Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? This is a follow-up PR of #17478. ## How was this patch tested? Existing tests

spark git commit: [SPARK-18901][FOLLOWUP][ML] Require in LR LogisticAggregator is redundant

2017-04-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0bc7a9021 -> 387565cf1 [SPARK-18901][FOLLOWUP][ML] Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? This is a follow-up PR of #17478. ## How was this patch tested? Existing tests Author:

spark git commit: [SPARK-18901][ML] Require in LR LogisticAggregator is redundant

2017-04-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.2 2bef01f64 -> cf16c3250 [SPARK-18901][ML] Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? In MultivariateOnlineSummarizer, `add` and `merge` have check for weights and feature sizes.

spark git commit: [SPARK-18901][ML] Require in LR LogisticAggregator is redundant

2017-04-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 776a2c0e9 -> 90264aced [SPARK-18901][ML] Require in LR LogisticAggregator is redundant ## What changes were proposed in this pull request? In MultivariateOnlineSummarizer, `add` and `merge` have check for weights and feature sizes. The

spark git commit: [MINOR][SPARKR] Move 'Data type mapping between R and Spark' to right place in SparkR doc.

2017-03-27 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3fada2f50 -> 1d00761b9 [MINOR][SPARKR] Move 'Data type mapping between R and Spark' to right place in SparkR doc. Section ```Data type mapping between R and Spark``` was put in the wrong place in SparkR doc currently, we should move it

spark git commit: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it was called on executors.

2017-03-21 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 c4d2b8338 -> 277ed375b [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it was called on executors. ## What changes were proposed in this pull request? SparkR ```spark.getSparkFiles``` fails when it was called on

spark git commit: [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it was called on executors.

2017-03-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master c1e87e384 -> 478fbc866 [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it was called on executors. ## What changes were proposed in this pull request? SparkR ```spark.getSparkFiles``` fails when it was called on executors,

spark git commit: [SPARK-19806][ML][PYSPARK] PySpark GeneralizedLinearRegression supports tweedie distribution.

2017-03-08 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1fa58868b -> 81303f7ca [SPARK-19806][ML][PYSPARK] PySpark GeneralizedLinearRegression supports tweedie distribution. ## What changes were proposed in this pull request? PySpark ```GeneralizedLinearRegression``` supports tweedie

spark git commit: [SPARK-19745][ML] SVCAggregator captures coefficients in its closure

2017-03-02 Thread yliang
Repository: spark Updated Branches: refs/heads/master 8417a7ae6 -> 93ae176e8 [SPARK-19745][ML] SVCAggregator captures coefficients in its closure ## What changes were proposed in this pull request? JIRA: [SPARK-19745](https://issues.apache.org/jira/browse/SPARK-19745) Reorganize

spark git commit: [SPARK-19734][PYTHON][ML] Correct OneHotEncoder doc string to say dropLast

2017-03-01 Thread yliang
Repository: spark Updated Branches: refs/heads/master 3bd8ddf7c -> d2a879762 [SPARK-19734][PYTHON][ML] Correct OneHotEncoder doc string to say dropLast ## What changes were proposed in this pull request? Updates the doc string to match up with the code i.e. say dropLast instead of

spark git commit: [MINOR][ML] Fix comments in LSH Examples and Python API

2017-03-01 Thread yliang
Repository: spark Updated Branches: refs/heads/master de2b53df4 -> 3bd8ddf7c [MINOR][ML] Fix comments in LSH Examples and Python API ## What changes were proposed in this pull request? Remove `org.apache.spark.examples.` in Add slash in one of the python doc. ## How was this patch tested?

spark git commit: [MINOR][ML][DOC] Document default value for GeneralizedLinearRegression.linkPower

2017-02-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 410392ed7 -> 6ab60542e [MINOR][ML][DOC] Document default value for GeneralizedLinearRegression.linkPower Add Scaladoc for GeneralizedLinearRegression.linkPower default value Follow-up to https://github.com/apache/spark/pull/16344

spark git commit: [SPARK-18285][SPARKR] SparkR approxQuantile supports input multiple columns

2017-02-17 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1a3f5f8c5 -> b40659838 [SPARK-18285][SPARKR] SparkR approxQuantile supports input multiple columns ## What changes were proposed in this pull request? SparkR ```approxQuantile``` supports input multiple columns. ## How was this patch

spark git commit: [SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing

2017-02-15 Thread yliang
Repository: spark Updated Branches: refs/heads/master 21b4ba2d6 -> 08c1972a0 [SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The API changes was

spark git commit: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master 90817a6cd -> 4172ff80d [SPARK-18929][ML] Add Tweedie distribution in GLM ## What changes were proposed in this pull request? I propose to add the full Tweedie family into the GeneralizedLinearRegression model. The Tweedie family is

spark git commit: [SPARK-19313][ML][MLLIB] GaussianMixture should limit the number of features

2017-01-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 76db394f2 -> 0e821ec6f [SPARK-19313][ML][MLLIB] GaussianMixture should limit the number of features ## What changes were proposed in this pull request? The following test will fail on current master scala test("gmm fails on high

spark git commit: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-23 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 8daf10e3f -> 1e07a7192 [SPARK-19155][ML] Make family case insensitive in GLM ## What changes were proposed in this pull request? This is a supplement to PR #16516 which did not make the value from `getFamily` case insensitive. Current

  1   2   3   >