[ 
https://issues.apache.org/jira/browse/SPARK-39544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koba updated SPARK-39544:
-------------------------
    Description: 
The naming of rawPredcitionCol in OneVsRest does not persist after saving and 
loading a trained model. This becomes an issue when I try to stack multiple One 
Vs Rest models in a pipeline. Code example below. 

{{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
{{data_path = "/sample_multiclass_classification_data.txt"}}
{{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = 
LinearSVC(regParam=0.01){}}}
{{# set the name of rawPrediction column}}
{{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
{{{}print(ovr.getRawPredictionCol()){}}}{{{}model = 
ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
{{model.write().overwrite().save(model_path)}}
{{model2 = OneVsRestModel.load(model_path)}}
{{model2.getRawPredictionCol()}}

{{Output:}}

{{raw_prediction }}{{'rawPrediction'}}

 

  was:
The naming of `rawPredcitionCol` in `OneVsRest` does not persist after saving 
and loading a trained model. This becomes an issue when I try to stack multiple 
One Vs Rest models in a pipeline. Code example below. 

{{```}}

{{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
{{data_path = "/sample_multiclass_classification_data.txt"}}
{{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = 
LinearSVC(regParam=0.01){}}}
{{# set the name of rawPrediction column}}
{{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
{{{}print(ovr.getRawPredictionCol()){}}}{{{}model = 
ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
{{model.write().overwrite().save(model_path)}}
{{model2 = OneVsRestModel.load(model_path)}}
{{model2.getRawPredictionCol()}}

{{Output:}}

{{raw_prediction }}{{'rawPrediction'}}

{{```}}


> setPredictionCol for OneVsRest does not persist when saving model to disk
> -------------------------------------------------------------------------
>
>                 Key: SPARK-39544
>                 URL: https://issues.apache.org/jira/browse/SPARK-39544
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.2.1, 3.3.0
>         Environment: Python 3.6
> Spark 3.2
>            Reporter: koba
>            Priority: Major
>
> The naming of rawPredcitionCol in OneVsRest does not persist after saving and 
> loading a trained model. This becomes an issue when I try to stack multiple 
> One Vs Rest models in a pipeline. Code example below. 
> {{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
> {{data_path = "/sample_multiclass_classification_data.txt"}}
> {{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = 
> LinearSVC(regParam=0.01){}}}
> {{# set the name of rawPrediction column}}
> {{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
> {{{}print(ovr.getRawPredictionCol()){}}}{{{}model = 
> ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
> {{model.write().overwrite().save(model_path)}}
> {{model2 = OneVsRestModel.load(model_path)}}
> {{model2.getRawPredictionCol()}}
> {{Output:}}
> {{raw_prediction }}{{'rawPrediction'}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to