support UDF

Makoto Yui (Jira) Wed, 27 Nov 2019 11:09:41 -0800


    [ 
https://issues.apache.org/jira/browse/HIVEMALL-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983834#comment-16983834
 ]


Makoto Yui edited comment on HIVEMALL-280 at 11/27/19 7:08 PM:
---------------------------------------------------------------


{noformat}
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)

"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.

SELECT
  A,
  B,
  A_B_count / tx_total as support,
  A_B_count / A_count AS confidence,
  (A_B_count / A_count) / (B_count / tx_total) AS lift 
FROM
  transaction 
ORDER BY
  1, 2;

Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php

txid    items
t1      {1, 3, 4}
t2      {2, 3, 5}
t3      {1, 2, 3, 5}
t4      {2, 5}
t5      {1, 2, 3, 5}

WITH tmp1 as (
 select
   arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
 from
   transaction
),
tmp2 as (
 select
   explode(rules) rule
 from 
  tmp1
)
select
  rule.item,
  rule.other,
  rule.supp,
  rule.conf,
  rule.lift
from
  tmp2;

explode items internally in UDAF process.
        a  b
    t1 {1, 3}, 
        t1 {1, 4}
        t1 {3, 4}
        t2 {2, 3}
        t2 {2, 5}
        t2 {3, 5}
        ....

1. compute tx_total by aggregation 
     long
2. compute item_count (a_count,b_count) by aggregation
     map<item, long>
3. compute ab_count by aggregation 
     map<pair<item,item>, long> 
4. emit (need to limit returning list size by supp/conf/lift value) 
         ordered_list<struct<item,item,supp,conf,lift>>
{noformat}



was (Author: myui):
supp(x=>y) = |xUy| / tx_total
conf(x=>y) = supp(x=>y)/supp(x) = |xUy| / |x|
lift(x=>y) = conf(x=>y)/supp(y) = (|xUy| / |x|) / (|y| / tx_total)

"A, B, A_count, B_count, AB_count, tx_total"
is required to compute support/confidense/lift for each A-B pair.

SELECT
  A,
  B,
  A_B_count / tx_total as support,
  A_B_count / A_count AS confidence,
  (A_B_count / A_count) / (B_count / tx_total) AS lift 
FROM
  transaction 
ORDER BY
  1, 2;

Another approach is taking the following table as the input.
Transaction table often takes the following form [1]
[1] https://www.philippe-fournier-viger.com/spmf/Apriori.php

txid    items
t1      {1, 3, 4}
t2      {2, 3, 5}
t3      {1, 2, 3, 5}
t4      {2, 5}
t5      {1, 2, 3, 5}

WITH tmp1 as (
 select
   arules(txid, items, '-min_supp 2 -topk 100 -order_by lift') as rules
 from
   transaction
),
tmp2 as (
 select
   explode(rules) rule
 from 
  tmp1
)
select
  rule.item,
  rule.other,
  rule.supp,
  rule.conf,
  rule.lift
from
  tmp2;

explode items internally in UDAF process.
        a  b
    t1 {1, 3}, 
        t1 {1, 4}
        t1 {3, 4}
        t2 {2, 3}
        t2 {2, 5}
        t2 {3, 5}
        ....

1. compute tx_total by aggregation 
     long
2. compute item_count (a_count,b_count) by aggregation
     map<item, long>
3. compute ab_count by aggregation 
     map<pair<item,item>, long> 
4. emit (need to limit returning list size by supp/conf/lift value) 
         ordered_list<struct<item,item,supp,conf,lift>>

> Support lift/confidence/support UDF
> -----------------------------------
>
>                 Key: HIVEMALL-280
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-280
>             Project: Hivemall
>          Issue Type: New Feature
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>            Priority: Minor
>
> Support lift/confidence/support UDAF
> [https://en.wikipedia.org/wiki/Lift_(data_mining])
> [https://towardsdatascience.com/a-gentle-introduction-on-market-basket-analysis-association-rules-fa4b986a40ce]
> [https://medium.com/@samratjain/explained-market-basket-analysis-using-sql-a7434f30e649]
> {code:java}
> select
>   item, other_item,
>   lift(...) as lift,
>   confidence (....) as confidence
> from
>   transaction
> group by
>   1, 2{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVEMALL-280) Support lift/confidence/support UDF

Reply via email to