GitHub user Swatisoni opened a pull request:
https://github.com/apache/madlib/pull/223
Balance datasets : re-sampling technique
JIRA:MADLIB-1168
Additional Authors:
Orhan Kislal [email protected]
Jingyi Mei [email protected]
Balanced datasets Phase 1 and Phase 2 implementation which performs
balanced sampling in following specified re-sampling techniques
1. Under-sampling the majority class(es), with- and without
replacement
2. Over-sampling the minority class
3. Combining over- and under-sampling
- Uniform sampling of all classes (default case)
4. Create ensemble balanced sets
- Re-sampling given comma-delimited string of specific class
and respective sample sizes
5. IC tests
Balanced sampling with grouping functionality will be implemented in phase 3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Swatisoni/madlib balanced_sets_final
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/223.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #223
----
commit 3b2d1f18b9cf5ef8f78669678d82dc29cd11812b
Author: Swatisoni <soniswati.2010@...>
Date: 2018-01-10T20:07:36Z
Balance datasets : re-sampling technique
JIRA:MADLIB-1168
Additional Authors:
Orhan Kislal [email protected]
Jingyi Mei [email protected]
Balanced datasets Phase 1 and Phase 2 implementation which performs
balanced sampling in following specified re-sampling techniques
1. Under-sampling the majority class(es), with- and without
replacement
2. Over-sampling the minority class
3. Combining over- and under-sampling
- Uniform sampling of all classes (default case)
4. Create ensemble balanced sets
- Re-sampling given comma-delimited string of specific class
and respective sample sizes
5. IC tests
Balanced sampling with grouping functionality will be implemented in phase 3
----
---