Re: Multi-Line JSON in SparkSQL

Olivier Girardot Sun, 03 May 2015 21:38:05 -0700

I'll try to study that and get back to you.
Regards,

Olivier.


Le lun. 4 mai 2015 à 04:05, Reynold Xin <[email protected]> a écrit :

> How does the pivotal format decides where to split the files? It seems to
> me the challenge is to decide that, and on the top of my head the only way
> to do this is to scan from the beginning and parse the json properly, which
> makes it not possible with large files (doable for whole input with a lot
> of small files though). If there is a better way, we should do it.
>
>
> On Sun, May 3, 2015 at 1:04 PM, Olivier Girardot <
> [email protected]> wrote:
>
>> Hi everyone,
>> Is there any way in Spark SQL to load multi-line JSON data efficiently, I
>> think there was in the mailing list a reference to
>> http://pivotal-field-engineering.github.io/pmr-common/ for its
>> JSONInputFormat
>>
>> But it's rather inaccessible considering the dependency is not available
>> in
>> any public maven repo (If you know of one, I'd be glad to hear it).
>>
>> Is there any plan to address this or any public recommendation ?
>> (considering the documentation clearly states that sqlContext.jsonFile
>> will
>> not work for multi-line json(s))
>>
>> Regards,
>>
>> Olivier.
>>
>
>

Re: Multi-Line JSON in SparkSQL

Reply via email to