[ https://issues.apache.org/jira/browse/SPARK-39544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
koba updated SPARK-39544: ------------------------- Description: The naming of rawPredcitionCol in OneVsRest does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. {{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}} {{data_path = "/sample_multiclass_classification_data.txt"}} {{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = LinearSVC(regParam=0.01){}}} {{# set the name of rawPrediction column}} {{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}} {{{}print(ovr.getRawPredictionCol()){}}}{{{}model = ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}} {{model.write().overwrite().save(model_path)}} {{model2 = OneVsRestModel.load(model_path)}} {{model2.getRawPredictionCol()}} {{Output:}} {{raw_prediction }}{{'rawPrediction'}} was: The naming of `rawPredcitionCol` in `OneVsRest` does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. {{```}} {{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}} {{data_path = "/sample_multiclass_classification_data.txt"}} {{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = LinearSVC(regParam=0.01){}}} {{# set the name of rawPrediction column}} {{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}} {{{}print(ovr.getRawPredictionCol()){}}}{{{}model = ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}} {{model.write().overwrite().save(model_path)}} {{model2 = OneVsRestModel.load(model_path)}} {{model2.getRawPredictionCol()}} {{Output:}} {{raw_prediction }}{{'rawPrediction'}} {{```}} > setPredictionCol for OneVsRest does not persist when saving model to disk > ------------------------------------------------------------------------- > > Key: SPARK-39544 > URL: https://issues.apache.org/jira/browse/SPARK-39544 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.2.1, 3.3.0 > Environment: Python 3.6 > Spark 3.2 > Reporter: koba > Priority: Major > > The naming of rawPredcitionCol in OneVsRest does not persist after saving and > loading a trained model. This becomes an issue when I try to stack multiple > One Vs Rest models in a pipeline. Code example below. > {{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}} > {{data_path = "/sample_multiclass_classification_data.txt"}} > {{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = > LinearSVC(regParam=0.01){}}} > {{# set the name of rawPrediction column}} > {{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}} > {{{}print(ovr.getRawPredictionCol()){}}}{{{}model = > ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}} > {{model.write().overwrite().save(model_path)}} > {{model2 = OneVsRestModel.load(model_path)}} > {{model2.getRawPredictionCol()}} > {{Output:}} > {{raw_prediction }}{{'rawPrediction'}} > -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org