For testing purposes you can take a sample of your data with take() and then transform that smaller dataset into an rdd.
-----Original Message----- From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>] Sent: Thursday, November 13, 2014 06:41 PM Eastern Standard Time To: Ganelin, Ilya Subject: Re: Spark- How can I run MapReduce only on one partition in an RDD? Hi Ganelin, Thank you for your reply. I can actually get partitions information with partitions(). But I cannot change partition to a new RDD which I want to use. I know it doesn't make sense if I only want to use one partition but create a large RDD. I just want to map each partition one by one. So I can quickly get the early map result from the RDD. That's why I want to read a file on HDFS to create multiple RDDs. Any suggestions? Thanks, Tim 2014-11-13 17:05 GMT-06:00 Ganelin, Ilya <ilya.gane...@capitalone.com<mailto:ilya.gane...@capitalone.com>>: Why do you only want the third partition? You can access individual partitions using the partitions() function. You can also filter your data using the filter() function to only contain the data you care about. Moreover, when you create your RDDs unless you define a custom partitioner you have no way of controlling what data is in partition #3. Therefore, there is almost no reason to want to operate on an individual partition. -----Original Message----- From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>] Sent: Thursday, November 13, 2014 06:01 PM Eastern Standard Time To: u...@spark.apache.org<mailto:u...@spark.apache.org> Subject: Spark- How can I run MapReduce only on one partition in an RDD? Hi All, I use textFile to create a RDD. However, I don't want to handle the whole data in this RDD. For example, maybe I only want to solve the data in 3rd partition of the RDD. How can I do it? Here are some possible solutions that I'm thinking: 1. Create multiple RDDs when reading the file 2. Run MapReduce functions with the specific partition for an RDD. However, I cannot find any appropriate function. Thank you and wait for your suggestions. Best, Tim ________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.