[ 
https://issues.apache.org/jira/browse/MADLIB-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1117:
------------------------------------
    Description: 
Context

The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.  
This is a somewhat arbitrary limit since memory usage depends on many things 
including data set, and which params in summary() are set.  If more columns per 
pass could be used, summary() would run faster.

Story

As a MADlib developer, I want to add "columns to process per pass" as an 
optional param for summary() function.  Default: use 15 columns (which is the 
current setting).  Suggested param name:  "columns_per_pass" though if you have 
a better name, that's fine.

Acceptance

1) Add new optional parameter and update docs.  Please add a note so it is 
clear what this control does.
2) Write and pass tests.

  was:
Context

The summary() function
http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
currently processes 15 columns per pass to keep memory usage below 1 GB limit.  
This is a somewhat arbitrary limit since memory usage depends on many things 
including data set, and which params in summary() are set.  If more columns per 
pass could be used, summary() would run faster.

Story

As a MADlib developer, I want to add "columns to process per pass" as an 
optional param for summary() function.  Default: use 15 columns (which is the 
current setting).

Acceptance

1) Add new optional parameter and update docs.
2) Write and pass tests.


> Add "columns to process per pass" as an optional param for summary()
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1117
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1117
>             Project: Apache MADlib
>          Issue Type: Improvement
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>             Fix For: v1.12
>
>
> Context
> The summary() function
> http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
> currently processes 15 columns per pass to keep memory usage below 1 GB 
> limit.  This is a somewhat arbitrary limit since memory usage depends on many 
> things including data set, and which params in summary() are set.  If more 
> columns per pass could be used, summary() would run faster.
> Story
> As a MADlib developer, I want to add "columns to process per pass" as an 
> optional param for summary() function.  Default: use 15 columns (which is the 
> current setting).  Suggested param name:  "columns_per_pass" though if you 
> have a better name, that's fine.
> Acceptance
> 1) Add new optional parameter and update docs.  Please add a note so it is 
> clear what this control does.
> 2) Write and pass tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to