GitHub user jingyimei opened a pull request: https://github.com/apache/madlib/pull/256
Minibatch Preprocessing: change default buffer size formula for grouping This commit changes the previous calculation formula for default buffer size. Previously, we used num_rows_processed/num_of_segments to indicate data distribution in each segment. To adjust this to a grouping scenario, we use avg_num_rows_processed/num_of_segment to indicate data distribution when there are more than one groups of data. Other code changes are due to this change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/minibatch-preprocessing-default_buffer_size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #256 ---- commit b33b00504ef46c7084b60f9c54d8f6797660542b Author: Jingyi Mei <jmei@...> Date: 2018-04-04T00:50:57Z Minibatch Preprocessing: change default buffer size formula to fit grouping This commit changes the previous calculation formula for default buffer size. Previously, we used num_rows_processed/num_of_segments to indicate data distribution in each segment. To adjust this to a grouping scenario, we use avg_num_rows_processed/num_of_segment to indicate data distribution when there are more than one groups of data. Other code changes are due to this change. ---- ---