[ https://issues.apache.org/jira/browse/SPARK-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-27925. ---------------------------------- Resolution: Not A Problem > Better control numBins of curves in BinaryClassificationMetrics > --------------------------------------------------------------- > > Key: SPARK-27925 > URL: https://issues.apache.org/jira/browse/SPARK-27925 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Priority: Major > > In case of large datasets with tens of thousands of partitions, current curve > down-sampling method tend to generate much more bins than the value set by > #numBins. > Since in current impl, grouping is done within partitions, that is to say, > each partition contains at least one bin. > A more reasonable way is to bring the grouping op forward into the sort op, > then we can directly set the #bins as the #partitions, and regard one > partition as one bin. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org