Another alternative:

rdd.take(1000).drop(100) //this also preserves ordering

Note however that this can lead to an OOM if the data you're taking is
too large. If you want to perform some operation sequentially on your
driver and don't care about performance, you could do something
similar as Mohammed suggested:

val filteredRDD = //same as previous post

filteredRDD.foreach{ elem =>
  // do something with elem, e.g. save to database
}



On Tue, Feb 9, 2016 at 2:56 PM, Mohammed Guller <moham...@glassbeam.com> wrote:
> You can do something like this:
>
>
>
> val indexedRDD = rdd.zipWithIndex
>
> val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) &&
> (index < 199)}
>
> val result = filteredRDD.take(100)
>
>
>
> Warning: the ordering of the elements in the RDD is not guaranteed.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
>
>
>
> -----Original Message-----
> From: SRK [mailto:swethakasire...@gmail.com]
> Sent: Tuesday, February 9, 2016 1:58 PM
> To: user@spark.apache.org
> Subject: How to collect/take arbitrary number of records in the driver?
>
>
>
> Hi ,
>
>
>
> How to get a fixed amount of records from an RDD in Driver? Suppose I want
> the records from 100 to 1000 and then save them to some external database, I
> know that I can do it from Workers in partition but I want to avoid that for
> some reasons. The idea is to collect the data to driver and save, although
> slowly.
>
>
>
> I am looking for something like take(100, 1000)  or take (1000,2000)
>
>
>
> Thanks,
>
> Swetha
>
>
>
>
>
>
>
> --
>
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-collect-take-arbitrary-number-of-records-in-the-driver-tp26184.html
>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to