[ 
https://issues.apache.org/jira/browse/SPARK-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828346#comment-15828346
 ] 

Asher Krim commented on SPARK-15573:
------------------------------------

Any thoughts on determining the version in the loading logic? I've seen relying 
on the spark version in a few places with code like this:

{code:java}
if (major.toInt < 2 || (major.toInt == 2 && minor.toInt == 0))
{code}

relying on the spark version feels wrong to me, for the reasons below:
* It creates a false coupling between unrelated things
* It ties this change to being released with a particular major/minor release
* It makes backporting impossible
* It becomes unwieldy past 2 or 3 changes
* It makes assumptions about the format of the spark version, which might not 
always hold (I'll admit this last point is a little farfetched, but not 
completely impossible for any users building spark themselves)

Some model changes involve changing the stored data (ie adding or changing 
columns or params). IMO, this is a better determiner of model version since it 
relies solely on the model data itself.

What are some opinions on alternatives to relying on the spark version?

> Backwards-compatible persistence for spark.ml
> ---------------------------------------------
>
>                 Key: SPARK-15573
>                 URL: https://issues.apache.org/jira/browse/SPARK-15573
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>
> This JIRA is for imposing backwards-compatible persistence for the 
> DataFrames-based API for MLlib.  I.e., we want to be able to load models 
> saved in previous versions of Spark.  We will not require loading models 
> saved in later versions of Spark.
> This requires:
> * Putting unit tests in place to check loading models from previous versions
> * Notifying all committers active on MLlib to be aware of this requirement in 
> the future
> The unit tests could be written as in spark.mllib, where we essentially 
> copied and pasted the save() code every time it changed.  This happens 
> rarely, so it should be acceptable, though other designs are fine.
> Subtasks of this JIRA should cover checking and adding tests for existing 
> cases, such as KMeansModel (whose format changed between 1.6 and 2.0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to