[ https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359623#comment-16359623 ]
Tristan Stevens commented on SPARK-10697: ----------------------------------------- [~srowen] a big +1 from me to implementing this. Without Lift, it becomes very difficult to assess whether a rule is even worth looking at. As an example, using the dataset from Wikipedia, we get the following output currently: {{from pyspark.ml.fpm import FPGrowth}}{{df = spark.createDataFrame([}} {{ (0, ["milk", "bread"]),}} {{ (1, ["butter"]),}} {{ (2, ["beer", "diapers"]),}} {{ (3, ["milk", "bread", "butter"] ),}} {{ (4, ["bread"],)}} {{], ["id", "items"])}}{{fpGrowth = FPGrowth(itemsCol="items", minSupport=0.2, minConfidence=0.2)}} {{model = fpGrowth.fit(df)}}{{# Display frequent itemsets.}} {{model.freqItemsets.show()}}{{# Display generated association rules.}} |items|freq| |[milk]|2| |[milk, butter]|1| |[milk, butter, br...|1| |[milk, bread]|2| |[diapers]|1| |[diapers, beer]|1| |[bread]|3| |[butter]|2| |[butter, bread]|1| |[beer]|1| {{model.associationRules.show()}} |antecedent|consequent|confidence| |[milk]|[butter]|0.5| |[milk]|[bread]|1.0| |[milk, butter]|[bread]|1.0| |[beer]|[diapers]|1.0| |[bread]|[milk]|0.6666666666666666| |[bread]|[butter]|0.3333333333333333| |[milk, bread]|[butter]|0.5| |[diapers]|[beer]|1.0| |[butter, bread]|[milk]|1.0| |[butter]|[milk]|0.5| |[butter]|[bread]|0.5| However this misses the detail that milk->bread is much less interesting than diapers->beer. When we add in lift we get the following: |antecedent|consequent|confidence|lift| |[milk]|[butter]|0.5|1.25| |[milk]|[bread]|1.0|1.66666666| |[milk, butter]|[bread]|1.0|1.66666666| |[beer]|[diapers]|1.0|5.0| |[bread]|[milk]|0.6666666666666666|1.66666666| |[bread]|[butter]|0.3333333333333333|0.83333333| |[milk, bread]|[butter]|0.5|1.25| |[diapers]|[beer]|1.0|5.0| |[butter, bread]|[milk]|1.0|2.5| |[butter]|[milk]|0.5|1.25| |[butter]|[bread]|0.5|0.83333333| So the proposal would be to add Lift to the Rules class, calculated by {{lift( x => y ) = sup(x U y) / (sup( x ) * sup( y ))}} > Lift Calculation in Association Rule mining > ------------------------------------------- > > Key: SPARK-10697 > URL: https://issues.apache.org/jira/browse/SPARK-10697 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Yashwanth Kumar > Priority: Minor > > Lift is to be calculated for Association rule mining in > AssociationRules.scala under FPM. > Lift is a measure of the performance of a Association rules. > Adding lift will help to compare the model efficiency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org