Re: Multi-Line JSON in SparkSQL

Reynold Xin Sun, 03 May 2015 23:55:37 -0700

I took a quick look at that implementation. I'm not sure if it actually
handles JSON correctly, because it attempts to find the first { starting
from a random point. However, that random point could be in the middle of a
string, and thus the first { might just be part of a string, rather than a
real JSON object starting position.



On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <[email protected]> wrote:

> You can check out the following library:
>
>    https://github.com/alexholmes/json-mapreduce
>
> --
> Emre Sevinç
>
>
> On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot <
> [email protected]> wrote:
>
> > Hi everyone,
> > Is there any way in Spark SQL to load multi-line JSON data efficiently, I
> > think there was in the mailing list a reference to
> > http://pivotal-field-engineering.github.io/pmr-common/ for its
> > JSONInputFormat
> >
> > But it's rather inaccessible considering the dependency is not available
> in
> > any public maven repo (If you know of one, I'd be glad to hear it).
> >
> > Is there any plan to address this or any public recommendation ?
> > (considering the documentation clearly states that sqlContext.jsonFile
> will
> > not work for multi-line json(s))
> >
> > Regards,
> >
> > Olivier.
> >
>
>
>
> --
> Emre Sevinc
>

Re: Multi-Line JSON in SparkSQL

Reply via email to