[
https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-2032:
-----------------------------
Target Version/s: (was: 1.1.0)
> Add an RDD.samplePartitions method for partition-level sampling
> ---------------------------------------------------------------
>
> Key: SPARK-2032
> URL: https://issues.apache.org/jira/browse/SPARK-2032
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Matei Zaharia
> Assignee: Prashant Sharma
> Priority: Minor
>
> This would allow us to sample a percent of the partitions and not have to
> materialize all of them. It's less uniform but much faster and may be useful
> for quickly exploring data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]