GitHub user jingyimei opened a pull request:

    https://github.com/apache/madlib/pull/256

    Minibatch Preprocessing: change default buffer size formula for grouping

    This commit changes the previous calculation formula for default buffer
    size. Previously, we used num_rows_processed/num_of_segments to indicate
    data distribution in each segment. To adjust this to a grouping
    scenario, we use avg_num_rows_processed/num_of_segment to indicate data
    distribution when there are more than one groups of data. Other code changes
    are due to this change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib 
feature/minibatch-preprocessing-default_buffer_size

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #256
    
----
commit b33b00504ef46c7084b60f9c54d8f6797660542b
Author: Jingyi Mei <jmei@...>
Date:   2018-04-04T00:50:57Z

    Minibatch Preprocessing: change default buffer size formula to fit grouping
    
    This commit changes the previous calculation formula for default buffer
    size. Previously, we used num_rows_processed/num_of_segments to indicate
    data distribution in each segment. To adjust this to a grouping
    scenario, we use avg_num_rows_processed/num_of_segment to indicate data
    distribution when there are more than one groups of data. Other code changes
    are due to this change.

----


---

Reply via email to