subject:"textFile partitions"

Re: textFile partitions

2015-02-09 Thread Kostas Sakellis

The partitions parameter to textFile is the minPartitions. So there will be at least that level of parallelism. Spark delegates to Hadoop to create the splits for that file (yes, even for a text file on disk and not hdfs). You can take a look at the code in FileInputFormat - but briefly it will

textFile partitions

2015-02-09 Thread Yana Kadiyska

Hi folks, puzzled by something pretty simple: I have a standalone cluster with default parallelism of 2, spark-shell running with 2 cores sc.textFile(README.md).partitions.size returns 2 (this makes sense) sc.textFile(README.md).coalesce(100,true).partitions.size returns 100, also makes sense