Arseniy Tashoyan created SPARK-23318:
----------------------------------------

             Summary: FP-growth: WARN FPGrowth: Input data is not cached
                 Key: SPARK-23318
                 URL: https://issues.apache.org/jira/browse/SPARK-23318
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.1
            Reporter: Arseniy Tashoyan


When running FPGrowth.fit() fromĀ _ml_ package, one can see a warning:

WARN FPGrowth: Input data is not cached.

This warning occurs even the dataset of transactions is cached.

Actually this warning comes from the FPGrowth implementation in old _mllib_ 
package. New FPGrowth performs some transformations on the input data set of 
transactions and then passes it to the old FPGrowth - without caching. Hence 
the warning.

The problem looks similar to SPARK-18356
 If you don't mind, I can push a similar fix:
{code}
// ml.FPGrowth
val handlePersistence = dataset.storageLevel == StorageLevel.NONE
if (handlePersistence) {
  // cache the data
}
// then call mllib.FPGrowth
// finally unpersist the data
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to