Joseph K. Bradley created SPARK-23740:
-----------------------------------------

             Summary: Add FPGrowth Param for filtering out very common items
                 Key: SPARK-23740
                 URL: https://issues.apache.org/jira/browse/SPARK-23740
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.3.0
            Reporter: Joseph K. Bradley


It would be handy to have a Param in FPGrowth for filtering out very common 
items.  This is from a use case where the dataset had items appearing in 99.9%+ 
of the rows.  These common items were useless, but they caused the algorithm to 
generate many unnecessary itemsets.  Filtering useless common items beforehand 
can make the algorithm much faster.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to