[ 
https://issues.apache.org/jira/browse/SPARK-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6370:
-----------------------------
    Assignee: Marko Bonaci

> Improve documentation of RDD.sample() fraction's effect
> -------------------------------------------------------
>
>                 Key: SPARK-6370
>                 URL: https://issues.apache.org/jira/browse/SPARK-6370
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.3.0, 1.2.1
>         Environment: Ubuntu 14.04 64-bit, spark-1.3.0-bin-hadoop2.4
>            Reporter: Marko Bonaci
>            Assignee: Marko Bonaci
>            Priority: Minor
>              Labels: PoissonSampler, sample, sampler
>             Fix For: 1.4.0
>
>
> Here's the repl output:
> {{code:java}}
> scala> uniqueIds.collect
> res10: Array[String] = Array(4, 8, 21, 80, 20, 98, 42, 15, 48, 36, 90, 46, 
> 55, 16, 31, 71, 9, 50, 28, 61, 68, 85, 12, 94, 38, 77, 2, 11, 10)
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[22] at sample 
> at <console>:27
> scala> swr.count
> res17: Long = 16
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[23] at sample 
> at <console>:27
> scala> swr.count
> res18: Long = 8
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[24] at sample 
> at <console>:27
> scala> swr.count
> res19: Long = 18
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[25] at sample 
> at <console>:27
> scala> swr.count
> res20: Long = 15
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[26] at sample 
> at <console>:27
> scala> swr.count
> res21: Long = 11
> scala> val swr = uniqueIds.sample(true, 0.5)
> swr: org.apache.spark.rdd.RDD[String] = PartitionwiseSampledRDD[27] at sample 
> at <console>:27
> scala> swr.count
> res22: Long = 10
> {{code}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to