[
https://issues.apache.org/jira/browse/SPARK-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-23269.
-------------------------------
Resolution: Won't Fix
> FP-growth: Provide last transaction for each detected frequent pattern
> ----------------------------------------------------------------------
>
> Key: SPARK-23269
> URL: https://issues.apache.org/jira/browse/SPARK-23269
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.1
> Reporter: Arseniy Tashoyan
> Priority: Minor
> Labels: MLlib, fp-growth
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> FP-growth implementation gives patterns and their frequences:
> _model.freqItemsets_:
> ||items||freq||
> |[5]|3|
> |[5, 1]|3|
> It would be great to know when each pattern occurred last time - what is the
> last transaction having this pattern?
> To do so, it will be necessary to tell FPGrowth what is the timestamp column
> in the transactions data frame:
> {code:java}
> val fpgrowth = new FPGrowth()
> .setItemsCol("items")
> .setTimestampCol("timestamp")
> {code}
> So the data frame with patterns could look like:
> ||items||freq||lastOccurrence||
> |[5]|3|2018-01-01 12:15:00|
> |[5, 1]|3|2018-01-01 12:15:00|
> Without this functionality, it is necessary to traverse the transactions data
> frame with the set of detected patterns and determine the last transaction
> for each pattern. Why traverse transactions once again if it has been already
> done in FP-growth execution?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]