[
https://issues.apache.org/jira/browse/SPARK-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329009#comment-15329009
]
Apache Spark commented on SPARK-15938:
--------------------------------------
User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/13656
> Adding "support" property to MLlib Association Rule
> ---------------------------------------------------
>
> Key: SPARK-15938
> URL: https://issues.apache.org/jira/browse/SPARK-15938
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: yuhao yang
> Priority: Minor
>
> _Support_ is an indication of how frequently the item-set appears in the
> database. Besides confidence, "Support" is another critical property for
> Association rule.
> References:
> https://en.wikipedia.org/wiki/Association_rule_learning
> http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules
> https://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
> _Support_ can be either the count of appearances or the fraction within the
> dataset. I choose to use the count as:
> 1. API compatibility: Currently both FPGrowthModel and Association Rule does
> not have the information about size of the dataset. I'd try to avoid breaking
> a list of public APIs.
> 2. This also refers to the API of SPMF.
> http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules.
> In the next steps, we could add constraint like minSupport as in other
> libraries. FPGrowthModel should also contains the size of the dataset.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]