ReneEnjilian opened a new pull request, #2133: URL: https://github.com/apache/systemds/pull/2133
This PR adds a new builtin function for ADASYN (Adaptive Synthetic Sampling) for generating synthetic data in case of class imbalances in ML datasets (binary classification). The method itself is implemented but I still need to add the test class in java. I manually tested the method on a real dataset called [Pima Indians Diabetes](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database), which the authors also used in the original paper. The generated synthetic data looked very reasonable when compared to the original data. I will make a more detailed description here once I added the test cases and conducted more experiments on the other datasets mentioned in the paper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org