Re: Multi-Line JSON in SparkSQL

Joe Halliwell Mon, 04 May 2015 00:37:46 -0700

I think Reynold’s argument shows the impossibility of the general case.




But a “maximum object depth” hint could enable a new input format to do its job 
both efficiently and correctly in the common case where the input is an array 
of similarly structured objects! I’d certainly be interested in an 
implementation along those lines.




Cheers,

Joe



http://www.joehalliwell.com

@joehalliwell

On Mon, May 4, 2015 at 7:55 AM, Reynold Xin <[email protected]> wrote:

> I took a quick look at that implementation. I'm not sure if it actually
> handles JSON correctly, because it attempts to find the first { starting
> from a random point. However, that random point could be in the middle of a
> string, and thus the first { might just be part of a string, rather than a
> real JSON object starting position.
> On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <[email protected]> wrote:
>> You can check out the following library:
>>
>>    https://github.com/alexholmes/json-mapreduce
>>
>> --
>> Emre Sevinç
>>
>>
>> On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot <
>> [email protected]> wrote:
>>
>> > Hi everyone,
>> > Is there any way in Spark SQL to load multi-line JSON data efficiently, I
>> > think there was in the mailing list a reference to
>> > http://pivotal-field-engineering.github.io/pmr-common/ for its
>> > JSONInputFormat
>> >
>> > But it's rather inaccessible considering the dependency is not available
>> in
>> > any public maven repo (If you know of one, I'd be glad to hear it).
>> >
>> > Is there any plan to address this or any public recommendation ?
>> > (considering the documentation clearly states that sqlContext.jsonFile
>> will
>> > not work for multi-line json(s))
>> >
>> > Regards,
>> >
>> > Olivier.
>> >
>>
>>
>>
>> --
>> Emre Sevinc
>>

Re: Multi-Line JSON in SparkSQL

Reply via email to