[jira] [Resolved] (SPARK-31140) Support Quick sample in RDD

Hyukjin Kwon (Jira) Sun, 22 Mar 2020 23:09:46 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-31140.
----------------------------------
    Resolution: Won't Fix

> Support Quick sample in RDD
> ---------------------------
>
>                 Key: SPARK-31140
>                 URL: https://issues.apache.org/jira/browse/SPARK-31140
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: deshanxiao
>            Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
>     val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
>     val thisSampler = sampler.clone
>     thisSampler.setSeed(split.seed)
>     thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-31140) Support Quick sample in RDD

Reply via email to