Doris Xin created SPARK-2082: -------------------------------- Summary: Stratified sampling implementation in PairRDDFunctions Key: SPARK-2082 URL: https://issues.apache.org/jira/browse/SPARK-2082 Project: Spark Issue Type: New Feature Reporter: Doris Xin
Implementation of stratified sampling that guarantees exact sample size = sum(math.ceil(S_i*sampingRate)) where S_i is the size of each stratum. -- This message was sent by Atlassian JIRA (v6.2#6252)