[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reassigned SPARK-23318: --------------------------------- Assignee: Arseniy Tashoyan > FP-growth: WARN FPGrowth: Input data is not cached > -------------------------------------------------- > > Key: SPARK-23318 > URL: https://issues.apache.org/jira/browse/SPARK-23318 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.2.1 > Reporter: Arseniy Tashoyan > Assignee: Arseniy Tashoyan > Priority: Minor > Labels: MLLib,, fp-growth > Fix For: 2.4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > When running FPGrowth.fit() fromĀ _ml_ package, one can see a warning: > WARN FPGrowth: Input data is not cached. > This warning occurs even the dataset of transactions is cached. > Actually this warning comes from the FPGrowth implementation in old _mllib_ > package. New FPGrowth performs some transformations on the input data set of > transactions and then passes it to the old FPGrowth - without caching. Hence > the warning. > The problem looks similar to SPARK-18356 > If you don't mind, I can push a similar fix: > {code} > // ml.FPGrowth > val handlePersistence = dataset.storageLevel == StorageLevel.NONE > if (handlePersistence) { > // cache the data > } > // then call mllib.FPGrowth > // finally unpersist the data > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org