[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21153


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r188132284
  
--- Diff: python/pyspark/ml/util.py ---
@@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, 
extraMetadata=None, paramMap=None):
 - sparkVersion
 - uid
 - paramMap
+- defalutParamMap (since 2.4.0)
--- End diff --

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r187129517
  
--- Diff: python/pyspark/ml/util.py ---
@@ -396,6 +397,7 @@ def saveMetadata(instance, path, sc, 
extraMetadata=None, paramMap=None):
 - sparkVersion
 - uid
 - paramMap
+- defalutParamMap (since 2.4.0)
--- End diff --

typo: default


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-05-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r185262870
  
--- Diff: python/pyspark/util.py ---
@@ -61,6 +62,26 @@ def _get_argspec(f):
 return argspec
 
 
+def majorMinorVersion(version):
--- End diff --

Yes please, a few things:
* throw exception when unable to parse
* make it a static method of a class pyspark.util.VersionUtils
* add unit test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r185159691
  
--- Diff: python/pyspark/util.py ---
@@ -61,6 +62,26 @@ def _get_argspec(f):
 return argspec
 
 
+def majorMinorVersion(version):
--- End diff --

> Also, shall we make this match the Scala API?
Do you mean to throw exception when unable to parse Spark version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r185131834
  
--- Diff: python/pyspark/util.py ---
@@ -61,6 +62,26 @@ def _get_argspec(f):
 return argspec
 
 
+def majorMinorVersion(version):
--- End diff --

Since this affects Spark core, it might be nice to put this in a separate 
PR.  Also, shall we make this match the Scala API?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184693332
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
+
+# Default param values
+jsonDefaultParams = {}
+for p in instance._defaultParamMap:
+jsonDefaultParams[p.name] = instance._defaultParamMap[p]
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184693307
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
--- End diff --

`_paramMap`'s keys are `Param` not string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184632888
  
--- Diff: python/pyspark/ml/util.py ---
@@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata):
 """
 Extract Params from metadata, and set them in the instance.
 """
+# User-supplied param values
 for paramName in metadata['paramMap']:
 param = instance.getParam(paramName)
 paramValue = metadata['paramMap'][paramName]
 instance.set(param, paramValue)
 
+# Default param values
+majorAndMinorVersions = majorMinorVersion(metadata['sparkVersion'])
+assert majorAndMinorVersions is not None, "Error loading metadata: 
Expected " + \
+"Spark version string but found 
{}".format(metadata['sparkVersion'])
+
+major = majorAndMinorVersions[0]
+minor = majorAndMinorVersions[1]
+# For metadata file prior to Spark 2.4, there is no default 
section.
+if major > 2 or (major == 2 and minor >= 4):
+assert 'defaultParamMap' in metadata, "Error loading metadata: 
Expected " + \
+"`defaultParamMap` section not found"
+
+for paramName in metadata['defaultParamMap']:
+param = instance.getParam(paramName)
+paramValue = metadata['defaultParamMap'][paramName]
+instance._setDefault(**{param.name: paramValue})
--- End diff --

Oh, right. My bad.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184626842
  
--- Diff: python/pyspark/ml/util.py ---
@@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata):
 """
 Extract Params from metadata, and set them in the instance.
 """
+# User-supplied param values
 for paramName in metadata['paramMap']:
 param = instance.getParam(paramName)
 paramValue = metadata['paramMap'][paramName]
 instance.set(param, paramValue)
 
+# Default param values
+majorAndMinorVersions = majorMinorVersion(metadata['sparkVersion'])
+assert majorAndMinorVersions is not None, "Error loading metadata: 
Expected " + \
+"Spark version string but found 
{}".format(metadata['sparkVersion'])
+
+major = majorAndMinorVersions[0]
+minor = majorAndMinorVersions[1]
+# For metadata file prior to Spark 2.4, there is no default 
section.
+if major > 2 or (major == 2 and minor >= 4):
+assert 'defaultParamMap' in metadata, "Error loading metadata: 
Expected " + \
+"`defaultParamMap` section not found"
+
+for paramName in metadata['defaultParamMap']:
+param = instance.getParam(paramName)
+paramValue = metadata['defaultParamMap'][paramName]
+instance._setDefault(**{param.name: paramValue})
--- End diff --

remove line `param = instance.getParam(paramName)` and change this line to
`instance._setDefault(**{paramName: paramValue})`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184620777
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
--- End diff --

I think use `_paramMap.copy()` will be simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184620855
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
+
+# Default param values
+jsonDefaultParams = {}
+for p in instance._defaultParamMap:
+jsonDefaultParams[p.name] = instance._defaultParamMap[p]
--- End diff --

similar, use `_defaultParamMap.copy()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-25 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/21153

[SPARK-24058][ML][PySpark] Default Params in ML should be saved separately: 
Python API

## What changes were proposed in this pull request?

See SPARK-23455 for reference. Now default params in ML are saved 
separately in metadata file in Scala. We must change it for Python for Spark 
2.4.0 as well in order to keep them in sync.

## How was this patch tested?

Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-24058

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21153


commit 0e9b18a7913c65e8a9b1c2304e2565a5ae9fbfbd
Author: Liang-Chi Hsieh 
Date:   2018-04-25T09:58:52Z

Default Params in ML should be saved separately in Python API.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org