Hello Everyone Do you think it would be useful to add combinedTextFile method (which uses CombineTextInputFormat) to SparkContext?
It allows one task to read data from multiple text files and control number of RDD partitions by setting mapreduce.input.fileinputformat.split.maxsize def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = { val conf = sc.hadoopConfiguration sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat], classOf[LongWritable], classOf[Text], conf). map(pair => pair._2.toString).setName(path) } Alex