Hi Paul, I have added some comments to DRILL-5974. Perhaps better to discuss in the JIRA for future referencing.
-Aman On Sat, Nov 18, 2017 at 7:33 PM, Paul Rogers <prog...@mapr.com> wrote: > Hi Arina, > > The proposal is to represent 2D arrays as a string (using the original, > unparsed JSON.) That is, given this input: > > {a: “fred”, b: [[10, 20, 30], [11, 21, 31]]} > > The parsed columns are: > > a, b > “fred”, "[[10, 20, 30], [11, 21, 31]]” > > Notice that column b is just a string. It is a string of JSON, yes, but > still just a string. > > So, the question about kvgen/flatten does not apply here since we are not > creating a Drill array. > > There is a very interesting discussion to be had about how Drill > does/should handle “non-relational” JSON structures. But, here, the > suggestions is just for one very simple special case. > > Thanks, > > - Paul > > > On Nov 18, 2017, at 7:15 AM, Arina Yelchiyeva < > arina.yelchiy...@gmail.com> wrote: > > > > In general sounds good. > > If user will apply kvgen / flatten over such 2-D array columns read as > > string, he will be able to normalize data in the format he wants? Right? > Or > > we need to come up with new function? > > > > Kind regards > > Arina > > > > On Fri, Nov 17, 2017 at 10:39 PM, Paul Rogers <prog...@mapr.com> wrote: > > > >> Hi All, > >> > >> I’d like to propose a minor enhancement to the JSON reader to better > >> handle non-relational JSON structures. (See DRILL-5974 [1].) > >> > >> As background, Drill handles simple tuples: > >> > >> {a: 10, b: “fred”} > >> > >> Drill also handles arrays: > >> > >> {name: “fred”, hobbies: [“bowling”, “golf”]} > >> > >> Drill even handles arrays of tuples: > >> > >> {name: “fred”, orders: [ > >> {id: 1001, amount: 12.34}, > >> {id: 1002, amount: 56.78}]} > >> > >> The above are termed "relational" because there is a straightforward > >> mapping to/from tables into the above JSON structures. > >> > >> Things get interesting with non-relational types, such as 2-D arrays: > >> > >> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]} > >> > >> Drill has two solutions: > >> > >> * Turn on the experimental list and union support. > >> * Enable all-text mode to read all fields as JSON text. > >> > >> Here, I’d like to propose a middle ground: > >> > >> * Read fields with relational types into vectors. > >> * Read non-relational fields using text mode. > >> > >> Thus, the first three examples would all result in the JSON data parsed > >> into Drill vectors. But, the fourth, non-relational example would > produce a > >> row that looks like this: > >> > >> id, shape, points > >> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]” > >> > >> Although Drill can’t parse the 2-D array, Drill will pass the array > along > >> to the client, which can use its favorite JSON parser to parse the array > >> and do something useful (like draw the square in this case.) > >> > >> In particular, the proposal: > >> > >> * Apply this change only to the revised “batch size aware” JSON reader. > >> * Use the above parsing model by default. > >> * Use the experimental list-and-union support if the existing > >> `exec.enable_union_type` system/session option is set. > >> > >> Existing queries should “just work.” In fact, now JSON with > non-relational > >> types will work “out-of-the-box” without all-text mode or the > experimental > >> types. > >> > >> Thoughts? > >> > >> - Paul > >> > >> [1] https://issues.apache.org/jira/browse/DRILL-5974 > >> > >> > >> > >