[ https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491974#comment-15491974 ]
yuhao yang commented on SPARK-15573: ------------------------------------ This sounds feasible. Two primary work items as I see: 1. Find a scandalized way to save models for different versions. 2. How to ensure model loading correctness for all the versions. (there might be parameter values change across versions). > Backwards-compatible persistence for spark.ml > --------------------------------------------- > > Key: SPARK-15573 > URL: https://issues.apache.org/jira/browse/SPARK-15573 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Joseph K. Bradley > > This JIRA is for imposing backwards-compatible persistence for the > DataFrames-based API for MLlib. I.e., we want to be able to load models > saved in previous versions of Spark. We will not require loading models > saved in later versions of Spark. > This requires: > * Putting unit tests in place to check loading models from previous versions > * Notifying all committers active on MLlib to be aware of this requirement in > the future > The unit tests could be written as in spark.mllib, where we essentially > copied and pasted the save() code every time it changed. This happens > rarely, so it should be acceptable, though other designs are fine. > Subtasks of this JIRA should cover checking and adding tests for existing > cases, such as KMeansModel (whose format changed between 1.6 and 2.0). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org