I took a quick look at that implementation. I'm not sure if it actually handles JSON correctly, because it attempts to find the first { starting from a random point. However, that random point could be in the middle of a string, and thus the first { might just be part of a string, rather than a real JSON object starting position.
On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <emre.sev...@gmail.com> wrote: > You can check out the following library: > > https://github.com/alexholmes/json-mapreduce > > -- > Emre Sevinç > > > On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > > > Hi everyone, > > Is there any way in Spark SQL to load multi-line JSON data efficiently, I > > think there was in the mailing list a reference to > > http://pivotal-field-engineering.github.io/pmr-common/ for its > > JSONInputFormat > > > > But it's rather inaccessible considering the dependency is not available > in > > any public maven repo (If you know of one, I'd be glad to hear it). > > > > Is there any plan to address this or any public recommendation ? > > (considering the documentation clearly states that sqlContext.jsonFile > will > > not work for multi-line json(s)) > > > > Regards, > > > > Olivier. > > > > > > -- > Emre Sevinc >