Then how performance of mapPartitions is faster than map? On Thu, Jun 25, 2015 at 6:40 PM, Daniel Darabos < daniel.dara...@lynxanalytics.com> wrote:
> Spark creates a RecordReader and uses next() on it when you call > input.next(). (See > https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L215) > How > the RecordReader works is an HDFS question, but it's safe to say there is > no difference between using map and mapPartitions. > > On Thu, Jun 25, 2015 at 1:28 PM, Shushant Arora <shushantaror...@gmail.com > > wrote: > >> say source is HDFS,And file is divided in 10 partitions. so what will be >> input contains. >> >> public Iterable<Integer> call(Iterator<String> input) >> >> say I have 10 executors in job each having single partition. >> >> will it have some part of partition or complete. And if some when I call >> input.next() - it will fetch rest or how is it handled ? >> >> >> >> >> >> On Thu, Jun 25, 2015 at 3:11 PM, Sean Owen <so...@cloudera.com> wrote: >> >>> No, or at least, it depends on how the source of the partitions was >>> implemented. >>> >>> On Thu, Jun 25, 2015 at 12:16 PM, Shushant Arora >>> <shushantaror...@gmail.com> wrote: >>> > Does mapPartitions keep complete partitions in memory of executor as >>> > iterable. >>> > >>> > JavaRDD<String> rdd = jsc.textFile("path"); >>> > JavaRDD<Integer> output = rdd.mapPartitions(new >>> > FlatMapFunction<Iterator<String>, Integer>() { >>> > >>> > public Iterable<Integer> call(Iterator<String> input) >>> > throws Exception { >>> > List<Integer> output = new ArrayList<Integer>(); >>> > while(input.hasNext()){ >>> > output.add(input.next().length()); >>> > } >>> > return output; >>> > } >>> > >>> > }); >>> > >>> > >>> > Here does input is present in memory and can contain complete >>> partition of >>> > gbs ? >>> > Will this function call(Iterator<String> input) is called only for no >>> of >>> > partitions(say if I have 10 in this example) times. Not no of lines >>> > times(say 10000000) . >>> > >>> > >>> > And whats the use of mapPartitionsWithIndex ? >>> > >>> > Thanks >>> > >>> >> >> >