This was implemented as sc.wholeTextFiles.

On Thu, May 19, 2016, 2:43 AM Reynold Xin <r...@databricks.com> wrote:

> Users would be able to run this already with the 3 lines of code you
> supplied right? In general there are a lot of methods already on
> SparkContext and we lean towards the more conservative side in introducing
> new API variants.
>
> Note that this is something we are doing automatically in Spark SQL for
> file sources (Dataset/DataFrame).
>
>
> On Sat, May 14, 2016 at 8:13 PM, Alexander Pivovarov <apivova...@gmail.com
> > wrote:
>
>> Hello Everyone
>>
>> Do you think it would be useful to add combinedTextFile method (which
>> uses CombineTextInputFormat) to SparkContext?
>>
>> It allows one task to read data from multiple text files and control
>> number of RDD partitions by setting
>> mapreduce.input.fileinputformat.split.maxsize
>>
>>
>>   def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = {
>>     val conf = sc.hadoopConfiguration
>>     sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat],
>> classOf[LongWritable], classOf[Text], conf).
>>       map(pair => pair._2.toString).setName(path)
>>   }
>>
>>
>> Alex
>>
>
>

Reply via email to