Frank McQuillan created MADLIB-1117:
---------------------------------------
Summary: Add "columns to process per pass" as an optional param
for summary()
Key: MADLIB-1117
URL: https://issues.apache.org/jira/browse/MADLIB-1117
Project: Apache MADlib
Issue Type: Improvement
Reporter: Frank McQuillan
Fix For: v1.12
Context
The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.
This is a somewhat arbitrary limit since memory usage depends on many things
including data set, and which params in summary() are set. If more columns per
pass could be used, summary() would run faster.
Story
As a MADlib developer, I want to add "columns to process per pass" as an
optional param for summary() function. Default: use 15 columns (which is the
current setting).
Acceptance
1) Add new optional parameter and update docs.
2) Write and pass tests.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)