Dong Wang created SPARK-29810:

             Summary: Missing persist on retaggedInput in
                 Key: SPARK-29810
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.4.3
            Reporter: Dong Wang

The rdd retaggedInput should be persisted in, 
because it will be used more than one actions.

  def run(
      input: RDD[LabeledPoint],
      strategy: OldStrategy,
      numTrees: Int,
      featureSubsetStrategy: String,
      seed: Long,
      instr: Option[Instrumentation],
      prune: Boolean = true, // exposed for testing only, real trees are always 
      parentUID: Option[String] = None): Array[DecisionTreeModel] = {

    val timer = new TimeTracker()
    val retaggedInput = input.retag(classOf[LabeledPoint]) // it needs to be 

This issue is reported by our tool CacheCheck, which is used to dynamically 
detecting persist()/unpersist() api misuses.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to