Frank McQuillan created MADLIB-1117:
---------------------------------------

             Summary: Add "columns to process per pass" as an optional param 
for summary()
                 Key: MADLIB-1117
                 URL: https://issues.apache.org/jira/browse/MADLIB-1117
             Project: Apache MADlib
          Issue Type: Improvement
            Reporter: Frank McQuillan
             Fix For: v1.12


Context

The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.  
This is a somewhat arbitrary limit since memory usage depends on many things 
including data set, and which params in summary() are set.  If more columns per 
pass could be used, summary() would run faster.

Story

As a MADlib developer, I want to add "columns to process per pass" as an 
optional param for summary() function.  Default: use 15 columns (which is the 
current setting).

Acceptance

1) Add new optional parameter and update docs.
2) Write and pass tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to