How does the pivotal format decides where to split the files? It seems to
me the challenge is to decide that, and on the top of my head the only way
to do this is to scan from the beginning and parse the json properly, which
makes it not possible with large files (doable for whole input with a lot
of small files though). If there is a better way, we should do it.


On Sun, May 3, 2015 at 1:04 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> Hi everyone,
> Is there any way in Spark SQL to load multi-line JSON data efficiently, I
> think there was in the mailing list a reference to
> http://pivotal-field-engineering.github.io/pmr-common/ for its
> JSONInputFormat
>
> But it's rather inaccessible considering the dependency is not available in
> any public maven repo (If you know of one, I'd be glad to hear it).
>
> Is there any plan to address this or any public recommendation ?
> (considering the documentation clearly states that sqlContext.jsonFile will
> not work for multi-line json(s))
>
> Regards,
>
> Olivier.
>

Reply via email to