[
https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855097#comment-15855097
]
Joseph K. Bradley commented on SPARK-15573:
-------------------------------------------
It's a good point that we can't make updates to older Spark releases for
persistence. However, I doubt that we would backport many such fixes for
non-bugs. The issue you reference is arguably a scalability limit, not a bug.
Still, adding an internal ML persistence version is a good idea; I'd be OK with
it.
> Backwards-compatible persistence for spark.ml
> ---------------------------------------------
>
> Key: SPARK-15573
> URL: https://issues.apache.org/jira/browse/SPARK-15573
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Joseph K. Bradley
>
> This JIRA is for imposing backwards-compatible persistence for the
> DataFrames-based API for MLlib. I.e., we want to be able to load models
> saved in previous versions of Spark. We will not require loading models
> saved in later versions of Spark.
> This requires:
> * Putting unit tests in place to check loading models from previous versions
> * Notifying all committers active on MLlib to be aware of this requirement in
> the future
> The unit tests could be written as in spark.mllib, where we essentially
> copied and pasted the save() code every time it changed. This happens
> rarely, so it should be acceptable, though other designs are fine.
> Subtasks of this JIRA should cover checking and adding tests for existing
> cases, such as KMeansModel (whose format changed between 1.6 and 2.0).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]