@joe, I'd be glad to help if you need. Le lun. 4 mai 2015 à 20:06, Matei Zaharia <matei.zaha...@gmail.com> a écrit :
> I don't know whether this is common, but we might also allow another > separator for JSON objects, such as two blank lines. > > Matei > > > On May 4, 2015, at 2:28 PM, Reynold Xin <r...@databricks.com> wrote: > > > > Joe - I think that's a legit and useful thing to do. Do you want to give > it > > a shot? > > > > On Mon, May 4, 2015 at 12:36 AM, Joe Halliwell <joe.halliw...@gmail.com> > > wrote: > > > >> I think Reynold’s argument shows the impossibility of the general case. > >> > >> But a “maximum object depth” hint could enable a new input format to do > >> its job both efficiently and correctly in the common case where the > input > >> is an array of similarly structured objects! I’d certainly be > interested in > >> an implementation along those lines. > >> > >> Cheers, > >> Joe > >> > >> http://www.joehalliwell.com > >> @joehalliwell > >> > >> > >> On Mon, May 4, 2015 at 7:55 AM, Reynold Xin <r...@databricks.com> > wrote: > >> > >>> I took a quick look at that implementation. I'm not sure if it actually > >>> handles JSON correctly, because it attempts to find the first { > starting > >>> from a random point. However, that random point could be in the middle > of > >>> a > >>> string, and thus the first { might just be part of a string, rather > than > >>> a > >>> real JSON object starting position. > >>> > >>> > >>> On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <emre.sev...@gmail.com> > >>> wrote: > >>> > >>>> You can check out the following library: > >>>> > >>>> https://github.com/alexholmes/json-mapreduce > >>>> > >>>> -- > >>>> Emre Sevinç > >>>> > >>>> > >>>> On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < > >>>> o.girar...@lateral-thoughts.com> wrote: > >>>> > >>>>> Hi everyone, > >>>>> Is there any way in Spark SQL to load multi-line JSON data > >>> efficiently, I > >>>>> think there was in the mailing list a reference to > >>>>> http://pivotal-field-engineering.github.io/pmr-common/ for its > >>>>> JSONInputFormat > >>>>> > >>>>> But it's rather inaccessible considering the dependency is not > >>> available > >>>> in > >>>>> any public maven repo (If you know of one, I'd be glad to hear it). > >>>>> > >>>>> Is there any plan to address this or any public recommendation ? > >>>>> (considering the documentation clearly states that > >>> sqlContext.jsonFile > >>>> will > >>>>> not work for multi-line json(s)) > >>>>> > >>>>> Regards, > >>>>> > >>>>> Olivier. > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Emre Sevinc > >>>> > >>> > >> > >> > >