lichenglin created SPARK-15497:
----------------------------------

             Summary: DecisionTreeClassificationModel can't be saved within in  
Pipeline caused by not implement Writable 
                 Key: SPARK-15497
                 URL: https://issues.apache.org/jira/browse/SPARK-15497
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 1.6.1
            Reporter: lichenglin


Here is my code
{code}
SQLContext sqlContext = getSQLContext();
                DataFrame data = 
sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt");
                // Index labels, adding metadata to the label column.
                // Fit on whole dataset to include all labels in index.
                StringIndexerModel labelIndexer = new StringIndexer()
                  .setInputCol("label")
                  .setOutputCol("indexedLabel")
                  .fit(data);
                // Automatically identify categorical features, and index them.
                VectorIndexerModel featureIndexer = new VectorIndexer()
                  .setInputCol("features")
                  .setOutputCol("indexedFeatures")
                  .setMaxCategories(4) // features with > 4 distinct values are 
treated as continuous
                  .fit(data);

                // Split the data into training and test sets (30% held out for 
testing)
                DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3});
                DataFrame trainingData = splits[0];
                DataFrame testData = splits[1];

                // Train a DecisionTree model.
                DecisionTreeClassifier dt = new DecisionTreeClassifier()
                  .setLabelCol("indexedLabel")
                  .setFeaturesCol("indexedFeatures");

                // Convert indexed labels back to original labels.
                IndexToString labelConverter = new IndexToString()
                  .setInputCol("prediction")
                  .setOutputCol("predictedLabel")
                  .setLabels(labelIndexer.labels());

                // Chain indexers and tree in a Pipeline
                Pipeline pipeline = new Pipeline()
                  .setStages(new PipelineStage[]{labelIndexer, featureIndexer, 
dt, labelConverter});

                // Train model.  This also runs the indexers.
                PipelineModel model = pipeline.fit(trainingData);
                model.save("file:///e:/tmpmodel");
{code}

and here is the exception
{code}
Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline 
write will fail on this Pipeline because it contains a stage which does not 
implement Writable. Non-Writable stage: dtc_7bdeae1c4fb8 of type class 
org.apache.spark.ml.classification.DecisionTreeClassificationModel
        at 
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218)
        at 
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at 
org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215)
        at 
org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325)
        at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)
        at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131)
        at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)
        at com.bjdv.spark.job.Testjob.main(Testjob.java:142)
{code}

sample_libsvm_data.txt is included in the 1.6.1 release tar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to