[
https://issues.apache.org/jira/browse/MADLIB-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan reassigned MADLIB-1117:
---------------------------------------
Assignee: Rahul Iyer
> Add "columns to process per pass" as an optional param for summary()
> --------------------------------------------------------------------
>
> Key: MADLIB-1117
> URL: https://issues.apache.org/jira/browse/MADLIB-1117
> Project: Apache MADlib
> Issue Type: Improvement
> Reporter: Frank McQuillan
> Assignee: Rahul Iyer
> Fix For: v1.12
>
>
> Context
> The summary() function
> http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
> currently processes 15 columns per pass to keep memory usage below 1 GB
> limit. This is a somewhat arbitrary limit since memory usage depends on many
> things including data set, and which params in summary() are set. If more
> columns per pass could be used, summary() would run faster.
> Story
> As a MADlib developer, I want to add "columns to process per pass" as an
> optional param for summary() function. Default: use 15 columns (which is the
> current setting).
> Acceptance
> 1) Add new optional parameter and update docs.
> 2) Write and pass tests.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)