[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21153 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r188132284 --- Diff: python/pyspark/ml/util.py --- @@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, extraMetadata=None, paramMap=None): - sparkVersion - uid - paramMap +- defalutParamMap (since 2.4.0) --- End diff -- Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r187129517 --- Diff: python/pyspark/ml/util.py --- @@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, extraMetadata=None, paramMap=None): - sparkVersion - uid - paramMap +- defalutParamMap (since 2.4.0) --- End diff -- typo: default --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r185262870 --- Diff: python/pyspark/util.py --- @@ -61,6 +62,26 @@ def _get_argspec(f): return argspec +def majorMinorVersion(version): --- End diff -- Yes please, a few things: * throw exception when unable to parse * make it a static method of a class pyspark.util.VersionUtils * add unit test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r185159691 --- Diff: python/pyspark/util.py --- @@ -61,6 +62,26 @@ def _get_argspec(f): return argspec +def majorMinorVersion(version): --- End diff -- > Also, shall we make this match the Scala API? Do you mean to throw exception when unable to parse Spark version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r185131834 --- Diff: python/pyspark/util.py --- @@ -61,6 +62,26 @@ def _get_argspec(f): return argspec +def majorMinorVersion(version): --- End diff -- Since this affects Spark core, it might be nice to put this in a separate PR. Also, shall we make this match the Scala API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184693332 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None): """ uid = instance.uid cls = instance.__module__ + '.' + instance.__class__.__name__ -params = instance.extractParamMap() + +# User-supplied param values +params = instance._paramMap jsonParams = {} if paramMap is not None: jsonParams = paramMap else: for p in params: jsonParams[p.name] = params[p] + +# Default param values +jsonDefaultParams = {} +for p in instance._defaultParamMap: +jsonDefaultParams[p.name] = instance._defaultParamMap[p] --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184693307 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None): """ uid = instance.uid cls = instance.__module__ + '.' + instance.__class__.__name__ -params = instance.extractParamMap() + +# User-supplied param values +params = instance._paramMap jsonParams = {} if paramMap is not None: jsonParams = paramMap else: for p in params: jsonParams[p.name] = params[p] --- End diff -- `_paramMap`'s keys are `Param` not string. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184632888 --- Diff: python/pyspark/ml/util.py --- @@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata): """ Extract Params from metadata, and set them in the instance. """ +# User-supplied param values for paramName in metadata['paramMap']: param = instance.getParam(paramName) paramValue = metadata['paramMap'][paramName] instance.set(param, paramValue) +# Default param values +majorAndMinorVersions = majorMinorVersion(metadata['sparkVersion']) +assert majorAndMinorVersions is not None, "Error loading metadata: Expected " + \ +"Spark version string but found {}".format(metadata['sparkVersion']) + +major = majorAndMinorVersions[0] +minor = majorAndMinorVersions[1] +# For metadata file prior to Spark 2.4, there is no default section. +if major > 2 or (major == 2 and minor >= 4): +assert 'defaultParamMap' in metadata, "Error loading metadata: Expected " + \ +"`defaultParamMap` section not found" + +for paramName in metadata['defaultParamMap']: +param = instance.getParam(paramName) +paramValue = metadata['defaultParamMap'][paramName] +instance._setDefault(**{param.name: paramValue}) --- End diff -- Oh, right. My bad. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184626842 --- Diff: python/pyspark/ml/util.py --- @@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata): """ Extract Params from metadata, and set them in the instance. """ +# User-supplied param values for paramName in metadata['paramMap']: param = instance.getParam(paramName) paramValue = metadata['paramMap'][paramName] instance.set(param, paramValue) +# Default param values +majorAndMinorVersions = majorMinorVersion(metadata['sparkVersion']) +assert majorAndMinorVersions is not None, "Error loading metadata: Expected " + \ +"Spark version string but found {}".format(metadata['sparkVersion']) + +major = majorAndMinorVersions[0] +minor = majorAndMinorVersions[1] +# For metadata file prior to Spark 2.4, there is no default section. +if major > 2 or (major == 2 and minor >= 4): +assert 'defaultParamMap' in metadata, "Error loading metadata: Expected " + \ +"`defaultParamMap` section not found" + +for paramName in metadata['defaultParamMap']: +param = instance.getParam(paramName) +paramValue = metadata['defaultParamMap'][paramName] +instance._setDefault(**{param.name: paramValue}) --- End diff -- remove line `param = instance.getParam(paramName)` and change this line to `instance._setDefault(**{paramName: paramValue})` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620777 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None): """ uid = instance.uid cls = instance.__module__ + '.' + instance.__class__.__name__ -params = instance.extractParamMap() + +# User-supplied param values +params = instance._paramMap jsonParams = {} if paramMap is not None: jsonParams = paramMap else: for p in params: jsonParams[p.name] = params[p] --- End diff -- I think use `_paramMap.copy()` will be simpler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620855 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None): """ uid = instance.uid cls = instance.__module__ + '.' + instance.__class__.__name__ -params = instance.extractParamMap() + +# User-supplied param values +params = instance._paramMap jsonParams = {} if paramMap is not None: jsonParams = paramMap else: for p in params: jsonParams[p.name] = params[p] + +# Default param values +jsonDefaultParams = {} +for p in instance._defaultParamMap: +jsonDefaultParams[p.name] = instance._defaultParamMap[p] --- End diff -- similar, use `_defaultParamMap.copy()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21153 [SPARK-24058][ML][PySpark] Default Params in ML should be saved separately: Python API ## What changes were proposed in this pull request? See SPARK-23455 for reference. Now default params in ML are saved separately in metadata file in Scala. We must change it for Python for Spark 2.4.0 as well in order to keep them in sync. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24058 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21153.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21153 commit 0e9b18a7913c65e8a9b1c2304e2565a5ae9fbfbd Author: Liang-Chi HsiehDate: 2018-04-25T09:58:52Z Default Params in ML should be saved separately in Python API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org