[
https://issues.apache.org/jira/browse/MADLIB-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1117:
------------------------------------
Description:
Context
The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.
This is a somewhat arbitrary limit since memory usage depends on many things
including data set, and which params in summary() are set. If more columns per
pass could be used, summary() would run faster.
Story
As a MADlib developer, I want to add "columns to process per pass" as an
optional param for summary() function. Default: use 15 columns (which is the
current setting). Suggested param name: "columns_per_pass" though if you have
a better name, that's fine.
Acceptance
1) Add new optional parameter and update docs. Please add a note so it is
clear what this control does.
2) Write and pass tests.
was:
Context
The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.
This is a somewhat arbitrary limit since memory usage depends on many things
including data set, and which params in summary() are set. If more columns per
pass could be used, summary() would run faster.
Story
As a MADlib developer, I want to add "columns to process per pass" as an
optional param for summary() function. Default: use 15 columns (which is the
current setting).
Acceptance
1) Add new optional parameter and update docs.
2) Write and pass tests.
> Add "columns to process per pass" as an optional param for summary()
> --------------------------------------------------------------------
>
> Key: MADLIB-1117
> URL: https://issues.apache.org/jira/browse/MADLIB-1117
> Project: Apache MADlib
> Issue Type: Improvement
> Reporter: Frank McQuillan
> Assignee: Rahul Iyer
> Fix For: v1.12
>
>
> Context
> The summary() function
> http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
> currently processes 15 columns per pass to keep memory usage below 1 GB
> limit. This is a somewhat arbitrary limit since memory usage depends on many
> things including data set, and which params in summary() are set. If more
> columns per pass could be used, summary() would run faster.
> Story
> As a MADlib developer, I want to add "columns to process per pass" as an
> optional param for summary() function. Default: use 15 columns (which is the
> current setting). Suggested param name: "columns_per_pass" though if you
> have a better name, that's fine.
> Acceptance
> 1) Add new optional parameter and update docs. Please add a note so it is
> clear what this control does.
> 2) Write and pass tests.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)