GitHub user jingyimei opened a pull request:
https://github.com/apache/madlib/pull/256
Minibatch Preprocessing: change default buffer size formula for grouping
This commit changes the previous calculation formula for default buffer
size. Previously, we used num_rows_processed/num_of_segments to indicate
data distribution in each segment. To adjust this to a grouping
scenario, we use avg_num_rows_processed/num_of_segment to indicate data
distribution when there are more than one groups of data. Other code changes
are due to this change.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib
feature/minibatch-preprocessing-default_buffer_size
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/256.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #256
----
commit b33b00504ef46c7084b60f9c54d8f6797660542b
Author: Jingyi Mei <jmei@...>
Date: 2018-04-04T00:50:57Z
Minibatch Preprocessing: change default buffer size formula to fit grouping
This commit changes the previous calculation formula for default buffer
size. Previously, we used num_rows_processed/num_of_segments to indicate
data distribution in each segment. To adjust this to a grouping
scenario, we use avg_num_rows_processed/num_of_segment to indicate data
distribution when there are more than one groups of data. Other code changes
are due to this change.
----
---