[jira] [Commented] (SPARK-10697) Lift Calculation in Association Rule mining

Tristan Stevens (JIRA) Sat, 10 Feb 2018 11:33:44 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359623#comment-16359623
 ]


Tristan Stevens commented on SPARK-10697:
-----------------------------------------

[~srowen] a big +1 from me to implementing this. Without Lift, it becomes very 
difficult to assess whether a rule is even worth looking at.

As an example, using the dataset from Wikipedia, we get the following output 
currently: 

{{from pyspark.ml.fpm import FPGrowth}}{{df = spark.createDataFrame([}}
 {{ (0, ["milk", "bread"]),}}
 {{ (1, ["butter"]),}}
 {{ (2, ["beer", "diapers"]),}}
 {{ (3, ["milk", "bread", "butter"] ),}}
 {{ (4, ["bread"],)}}
 {{], ["id", "items"])}}{{fpGrowth = FPGrowth(itemsCol="items", minSupport=0.2, 
minConfidence=0.2)}}
 {{model = fpGrowth.fit(df)}}{{# Display frequent itemsets.}}
 {{model.freqItemsets.show()}}{{# Display generated association rules.}}

|items|freq|
|[milk]|2|
|[milk, butter]|1|
|[milk, butter, br...|1|
|[milk, bread]|2|
|[diapers]|1|
|[diapers, beer]|1|
|[bread]|3|
|[butter]|2|
|[butter, bread]|1|
|[beer]|1|


{{model.associationRules.show()}}
 
|antecedent|consequent|confidence|
|[milk]|[butter]|0.5|
|[milk]|[bread]|1.0|
|[milk, butter]|[bread]|1.0|
|[beer]|[diapers]|1.0|
|[bread]|[milk]|0.6666666666666666|
|[bread]|[butter]|0.3333333333333333|
|[milk, bread]|[butter]|0.5|
|[diapers]|[beer]|1.0|
|[butter, bread]|[milk]|1.0|
|[butter]|[milk]|0.5|
|[butter]|[bread]|0.5|


 However this misses the detail that milk->bread is much less interesting than 
diapers->beer. When we add in lift we get the following:
 
|antecedent|consequent|confidence|lift|
|[milk]|[butter]|0.5|1.25|
|[milk]|[bread]|1.0|1.66666666|
|[milk, butter]|[bread]|1.0|1.66666666|
|[beer]|[diapers]|1.0|5.0|
|[bread]|[milk]|0.6666666666666666|1.66666666|
|[bread]|[butter]|0.3333333333333333|0.83333333|
|[milk, bread]|[butter]|0.5|1.25|
|[diapers]|[beer]|1.0|5.0|
|[butter, bread]|[milk]|1.0|2.5|
|[butter]|[milk]|0.5|1.25|
|[butter]|[bread]|0.5|0.83333333|



So the proposal would be to add Lift to the Rules class, calculated by
 {{lift( x => y ) = sup(x U y) / (sup( x ) * sup( y ))}}

> Lift Calculation in Association Rule mining
> -------------------------------------------
>
>                 Key: SPARK-10697
>                 URL: https://issues.apache.org/jira/browse/SPARK-10697
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Yashwanth Kumar
>            Priority: Minor
>
> Lift is to be calculated for Association rule mining in 
> AssociationRules.scala under FPM.
> Lift is a measure of the performance of a  Association rules.
> Adding lift will help to compare the model efficiency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10697) Lift Calculation in Association Rule mining

Reply via email to