Arseniy Tashoyan created SPARK-23318: ----------------------------------------
Summary: FP-growth: WARN FPGrowth: Input data is not cached Key: SPARK-23318 URL: https://issues.apache.org/jira/browse/SPARK-23318 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.2.1 Reporter: Arseniy Tashoyan When running FPGrowth.fit() fromĀ _ml_ package, one can see a warning: WARN FPGrowth: Input data is not cached. This warning occurs even the dataset of transactions is cached. Actually this warning comes from the FPGrowth implementation in old _mllib_ package. New FPGrowth performs some transformations on the input data set of transactions and then passes it to the old FPGrowth - without caching. Hence the warning. The problem looks similar to SPARK-18356 If you don't mind, I can push a similar fix: {code} // ml.FPGrowth val handlePersistence = dataset.storageLevel == StorageLevel.NONE if (handlePersistence) { // cache the data } // then call mllib.FPGrowth // finally unpersist the data {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org