I think Reynold’s argument shows the impossibility of the general case.
But a “maximum object depth” hint could enable a new input format to do its job both efficiently and correctly in the common case where the input is an array of similarly structured objects! I’d certainly be interested in an implementation along those lines. Cheers, Joe http://www.joehalliwell.com @joehalliwell On Mon, May 4, 2015 at 7:55 AM, Reynold Xin <r...@databricks.com> wrote: > I took a quick look at that implementation. I'm not sure if it actually > handles JSON correctly, because it attempts to find the first { starting > from a random point. However, that random point could be in the middle of a > string, and thus the first { might just be part of a string, rather than a > real JSON object starting position. > On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <emre.sev...@gmail.com> wrote: >> You can check out the following library: >> >> https://github.com/alexholmes/json-mapreduce >> >> -- >> Emre Sevinç >> >> >> On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < >> o.girar...@lateral-thoughts.com> wrote: >> >> > Hi everyone, >> > Is there any way in Spark SQL to load multi-line JSON data efficiently, I >> > think there was in the mailing list a reference to >> > http://pivotal-field-engineering.github.io/pmr-common/ for its >> > JSONInputFormat >> > >> > But it's rather inaccessible considering the dependency is not available >> in >> > any public maven repo (If you know of one, I'd be glad to hear it). >> > >> > Is there any plan to address this or any public recommendation ? >> > (considering the documentation clearly states that sqlContext.jsonFile >> will >> > not work for multi-line json(s)) >> > >> > Regards, >> > >> > Olivier. >> > >> >> >> >> -- >> Emre Sevinc >>