[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia updated SPARK-2032: --------------------------------- Priority: Minor (was: Major) > Add an RDD.samplePartitions method for partition-level sampling > --------------------------------------------------------------- > > Key: SPARK-2032 > URL: https://issues.apache.org/jira/browse/SPARK-2032 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Matei Zaharia > Priority: Minor > > This would allow us to sample a percent of the partitions and not have to > materialize all of them. It's less uniform but much faster and may be useful > for quickly exploring data. -- This message was sent by Atlassian JIRA (v6.2#6252)