(spark) branch branch-4.0 updated: [SPARK-51140][ML] Sort the params before saving

ruifengz Mon, 10 Feb 2025 16:11:03 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new fab541d43395 [SPARK-51140][ML] Sort the params before saving
fab541d43395 is described below

commit fab541d43395a61c1b295aa46717d183bf4236ff
Author: Ruifeng Zheng <ruife...@apache.org>
AuthorDate: Tue Feb 11 08:10:20 2025 +0800

    [SPARK-51140][ML] Sort the params before saving
    
    ### What changes were proposed in this pull request?
    Sort the params before saving
    
    ### Why are the changes needed?
    to improve debugability:
    when developing ml connect, sometime I need to manually check the stored 
models, I notice that
    the params are always saved unsorted, make it hard to compare the params:
    
    before:
    
    ```
    
{"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1738926090947,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_a5c66eecaec6","paramMap":{"dstCol":"dst","k":2,"weightCol":"weight","maxIter":40,"srcCol":"src","initMode":"random"},"defaultParamMap":{"dstCol":"dst","k":2,"maxIter":20,"srcCol":"src","initMode":"random"}}
    ```
    
    ```
    
{"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1738926386839,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_b91c5734d913","paramMap":{"k":2,"initMode":"random","weightCol":"weight","srcCol":"src","maxIter":40,"dstCol":"dst"},"defaultParamMap":{"k":2,"initMode":"random","srcCol":"src","maxIter":20,"dstCol":"dst"}}
    ```
    
    after:
    ```
    
{"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1739154410677,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_483e02530367","paramMap":{"k":2,"maxIter":40,"weightCol":"weight"},"defaultParamMap":{"dstCol":"dst","initMode":"random","k":2,"maxIter":20,"srcCol":"src"}}
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    existing tests and manually check
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #49861 from zhengruifeng/ml_store_params.
    
    Authored-by: Ruifeng Zheng <ruife...@apache.org>
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
    (cherry picked from commit 8071868d1fc9c78a46d0095e38b450e88abcd4e6)
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
---
 mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala 
b/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
index dcb337218edc..d155f257d230 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
@@ -486,8 +486,8 @@ private[ml] object DefaultParamsWriter {
       paramMap: Option[JValue]): String = {
     val uid = instance.uid
     val cls = instance.getClass.getName
-    val params = instance.paramMap.toSeq
-    val defaultParams = instance.defaultParamMap.toSeq
+    val params = instance.paramMap.toSeq.sortBy(_.param.name)
+    val defaultParams = instance.defaultParamMap.toSeq.sortBy(_.param.name)
     val jsonParams = paramMap.getOrElse(render(params.map { case ParamPair(p, 
v) =>
       p.name -> parse(p.jsonEncode(v))
     }.toList))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-4.0 updated: [SPARK-51140][ML] Sort the params before saving

Reply via email to