[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345447#comment-15345447 ]
Yuewei Na commented on SPARK-9478: ---------------------------------- Hi [~sethah]. Actually, the code I PR has been used in our company for a period of time and we recently decide to make it open sourced. We used this implementation due to the fact that there is no class weights support in the current version and we do have practical needs. Comparing to sample weights, our version saves more memory since ours don't need to add a column to store sample weights. At the same time, I browsed the APIs and implementations of the ensemble methods in scikit-learn. It's true that the class weights are integrated together with sample weights there. Together with the need of sample weights in other various models, I agree that a functionality that supports sample weights is a better choice. So now I have some thoughts on this problem: 1. I agree with you on implementing a mechanism to support class weights. I think it will reduce users' effort to achieve their goal. 2. Since my PR is a lightweight version and it has been tested and used in our company for a period of time, we could review and merge my PR to the master branch first to make it available to users who need it. And we can remove it when there are no problems that block the instance weight version while preserving the same interface for setting the class weights. We could either create a new JIRA which separates the problem 'adding class weights' and the problem 'adding instance weights'. But at least, the title of the current JIRA should be changed or a new JIRA should be created. > Add class weights to Random Forest > ---------------------------------- > > Key: SPARK-9478 > URL: https://issues.apache.org/jira/browse/SPARK-9478 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 1.4.1 > Reporter: Patrick Crenshaw > > Currently, this implementation of random forest does not support class > weights. Class weights are important when there is imbalanced training data > or the evaluation metric of a classifier is imbalanced (e.g. true positive > rate at some false positive threshold). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org