Arseniy Tashoyan created SPARK-23318:

             Summary: FP-growth: WARN FPGrowth: Input data is not cached
                 Key: SPARK-23318
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.1
            Reporter: Arseniy Tashoyan

When running fromĀ _ml_ package, one can see a warning:

WARN FPGrowth: Input data is not cached.

This warning occurs even the dataset of transactions is cached.

Actually this warning comes from the FPGrowth implementation in old _mllib_ 
package. New FPGrowth performs some transformations on the input data set of 
transactions and then passes it to the old FPGrowth - without caching. Hence 
the warning.

The problem looks similar to SPARK-18356
 If you don't mind, I can push a similar fix:
// ml.FPGrowth
val handlePersistence = dataset.storageLevel == StorageLevel.NONE
if (handlePersistence) {
  // cache the data
// then call mllib.FPGrowth
// finally unpersist the data

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to