Then how performance of mapPartitions is faster than map?

On Thu, Jun 25, 2015 at 6:40 PM, Daniel Darabos <
daniel.dara...@lynxanalytics.com> wrote:

> Spark creates a RecordReader and uses next() on it when you call
> input.next(). (See
> https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L215)
>  How
> the RecordReader works is an HDFS question, but it's safe to say there is
> no difference between using map and mapPartitions.
>
> On Thu, Jun 25, 2015 at 1:28 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> say source is HDFS,And file is divided in 10 partitions. so what will be
>>  input contains.
>>
>> public Iterable<Integer> call(Iterator<String> input)
>>
>> say I have 10 executors in job each having single partition.
>>
>> will it have some part of partition or complete. And if some when I call
>> input.next() - it will fetch rest or how is it handled ?
>>
>>
>>
>>
>>
>> On Thu, Jun 25, 2015 at 3:11 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> No, or at least, it depends on how the source of the partitions was
>>> implemented.
>>>
>>> On Thu, Jun 25, 2015 at 12:16 PM, Shushant Arora
>>> <shushantaror...@gmail.com> wrote:
>>> > Does mapPartitions keep complete partitions in memory of executor as
>>> > iterable.
>>> >
>>> > JavaRDD<String> rdd = jsc.textFile("path");
>>> > JavaRDD<Integer> output = rdd.mapPartitions(new
>>> > FlatMapFunction<Iterator<String>, Integer>() {
>>> >
>>> > public Iterable<Integer> call(Iterator<String> input)
>>> > throws Exception {
>>> > List<Integer> output = new ArrayList<Integer>();
>>> > while(input.hasNext()){
>>> > output.add(input.next().length());
>>> > }
>>> > return output;
>>> > }
>>> >
>>> > });
>>> >
>>> >
>>> > Here does input is present in memory and can contain complete
>>> partition of
>>> > gbs ?
>>> > Will this function call(Iterator<String> input) is called only for no
>>> of
>>> > partitions(say if I have 10 in this example) times. Not no of lines
>>> > times(say 10000000) .
>>> >
>>> >
>>> > And whats the use of mapPartitionsWithIndex ?
>>> >
>>> > Thanks
>>> >
>>>
>>
>>
>

Reply via email to