This is an automated email from the ASF dual-hosted git repository. njayaram pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/madlib.git
commit ed202b1020643cde011a0472ebbe84ed7d6b63a0 Author: Frank McQuillan <[email protected]> AuthorDate: Wed Jul 24 13:37:47 2019 -0700 Assoc Rules: Minor updates to user docs for new params --- .../modules/assoc_rules/assoc_rules.sql_in | 23 +++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in index 321b4fa..a7f639b 100644 --- a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in +++ b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in @@ -163,9 +163,12 @@ meets minimum confidence requirements. @note Beware of combinatorial explosion. The Apriori algorithm can potentially generate a huge number of rules, even for fairly simple data sets, resulting -in run-times that are unreasonably long. To avoid this, it is recommended +in run times that are unreasonably long. To avoid this, it is recommended to cap the maximum itemset size to a small number to start with, then -increase it gradually. <em>Support</em> and <em>confidence</em> values are +increase it gradually. Similarly, <em>max_LHS_size</em> and <em>max_RHS_size</em> +limit the number of items on the LHS and RHS of the rules +and can significantly reduce run times. +<em>Support</em> and <em>confidence</em> values are parameters that can also be used to control rule generation. @anchor syntax @@ -280,17 +283,16 @@ This generates all association rules that satisfy the specified minimum <dt>max_LHS_size (optional)</dt> <dd>INTEGER, default: NULL. Determines the maximum size of the left hand side of the rule. Must be 1 or more. - This parameter can be used to reduce run time for data sets where itemset size is large, - which is a common situation. If your query is not returning or is running too long, - try using a lower value for this parameter.</dd> + This parameter can be used to reduce run time.</dd> <dt>max_RHS_size (optional)</dt> <dd>INTEGER, default: NULL. Determines the maximum size of the right hand side of the rule. Must be 1 or more. - This parameter can be used to reduce run time for data sets where itemset size is large, - which is a common situation. If your query is not returning or is running too long, - try using a lower value for this parameter.</dd> + This parameter can be used to reduce run time. For example, setting to 1 + can significantly reduce run time if this makes sense for your use case. + (The <em>apriori</em> algorithm in the R package <em>arules</em> [2] only + supports a RHS of 1.)</dd> </dl> @@ -462,13 +464,16 @@ Result: The association rules function always creates a table named \c assoc_rules. Make a copy of this table before running the function again if you would -like to keep multiple association rule tables. +like to keep multiple association rule tables. This behavior will be improved +in a later release. @anchor literature @literature [1] https://en.wikipedia.org/wiki/Apriori_algorithm +[2] https://cran.r-project.org/web/packages/arules/arules.pdf + @anchor related @par Related Topics
