Re: combitedTextFile and CombineTextInputFormat

Xiangrui Meng Thu, 19 May 2016 08:43:05 -0700

This was implemented as sc.wholeTextFiles.

On Thu, May 19, 2016, 2:43 AM Reynold Xin <[email protected]> wrote:


> Users would be able to run this already with the 3 lines of code you
> supplied right? In general there are a lot of methods already on
> SparkContext and we lean towards the more conservative side in introducing
> new API variants.
>
> Note that this is something we are doing automatically in Spark SQL for
> file sources (Dataset/DataFrame).
>
>
> On Sat, May 14, 2016 at 8:13 PM, Alexander Pivovarov <[email protected]
> > wrote:
>
>> Hello Everyone
>>
>> Do you think it would be useful to add combinedTextFile method (which
>> uses CombineTextInputFormat) to SparkContext?
>>
>> It allows one task to read data from multiple text files and control
>> number of RDD partitions by setting
>> mapreduce.input.fileinputformat.split.maxsize
>>
>>
>>   def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = {
>>     val conf = sc.hadoopConfiguration
>>     sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat],
>> classOf[LongWritable], classOf[Text], conf).
>>       map(pair => pair._2.toString).setName(path)
>>   }
>>
>>
>> Alex
>>
>
>

Re: combitedTextFile and CombineTextInputFormat

Reply via email to