GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/239
Balance Sample: Add support for grouping
JIRA: MADLIB-1168
This commit adds grouping support for balanced sampling.
Grouping is implemented as a loop over the existing logic,
with the sampling for each group run independently.
Closes #239
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib
feature/balanced-datasets-grouping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/239.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #239
----
commit 6c5fcfb375eaf7dc68e1ede4aca2a47b8e55309b
Author: Rahul Iyer <riyer@...>
Date: 2018-02-24T02:45:32Z
Clean code + conform to PEP8
commit a5a0c1e2c851a923b9eb550d42dfc594b4635c64
Author: Rahul Iyer <riyer@...>
Date: 2018-02-26T23:01:34Z
Add a Collate plpy results function
commit 8e8eca2960207ca0317ded68608c660b8d4ddb55
Author: Rahul Iyer <riyer@...>
Date: 2018-03-02T00:44:54Z
Add grouping in get_level_frequency_distribution
commit cad4a5be732f89504ff62f4d9e68367d174fc322
Author: Rahul Iyer <riyer@...>
Date: 2018-03-07T07:07:00Z
Ensure subqueries are filtering groups and using right count
commit 39dd6f436bb9b8d505be5204226dcc3053b1b4df
Author: Rahul Iyer <riyer@...>
Date: 2018-03-07T07:07:14Z
Update install check to include grouping
commit d61ff28290dad27ead0f1c68d740a8ccb79f4aec
Author: Rahul Iyer <riyer@...>
Date: 2018-03-07T07:07:27Z
Update documentation with grouping examples
----
---