[
https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1288:
------------------------------------
Priority: Minor (was: Major)
> Set max itemset size to 10 by default in assoc rules
> ----------------------------------------------------
>
> Key: MADLIB-1288
> URL: https://issues.apache.org/jira/browse/MADLIB-1288
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Association Rules
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.16
>
>
> Story
> As a data scientist,
> I want to default itemset size to 10,
> so that assoc rules does not run for a long time.
> Details
> We have had some complaints about how long assoc rules runs which could have
> to do with the implementation, or wrong parameter settings by the user, but
> may also be due to combinatorial explosion of number of generated rules.
> The R param `maxlen` is default to 10
> https://cran.r-project.org/web/packages/arules/arules.pdf
> see page 10 "apriori - mining associations with apriori"
> which is the same as the madlib param `max_itemset_size`
> http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html
> "If the minimum support is chosen too low for the dataset,
> then the algorithm will try to create an extremely large set of
> itemsets/rules. This will result in
> very long run time and eventually the process will run out of memory. To
> prevent this, the default
> maximal length of itemsets/rules is restricted to 10 items (via the parameter
> element `maxlen=10`)..."
> Interface
> Stays the same. The allowed values for max_itemset_size are:
> * any number 2 or more
> * if not specified set to 10 (default)
> * can also accept `ALL` as in input which means generate itemsets of all
> sizes - this is the current behavior today in 1.15.1
> Acceptance
> 1) Set `max_itemset_size` parameter to 100 and run a data set that creates
> rules with more than 10 items.
> 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule
> size limit is respected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)