Hi Arina, The proposal is to represent 2D arrays as a string (using the original, unparsed JSON.) That is, given this input:
{a: “fred”, b: [[10, 20, 30], [11, 21, 31]]} The parsed columns are: a, b “fred”, "[[10, 20, 30], [11, 21, 31]]” Notice that column b is just a string. It is a string of JSON, yes, but still just a string. So, the question about kvgen/flatten does not apply here since we are not creating a Drill array. There is a very interesting discussion to be had about how Drill does/should handle “non-relational” JSON structures. But, here, the suggestions is just for one very simple special case. Thanks, - Paul > On Nov 18, 2017, at 7:15 AM, Arina Yelchiyeva <arina.yelchiy...@gmail.com> > wrote: > > In general sounds good. > If user will apply kvgen / flatten over such 2-D array columns read as > string, he will be able to normalize data in the format he wants? Right? Or > we need to come up with new function? > > Kind regards > Arina > > On Fri, Nov 17, 2017 at 10:39 PM, Paul Rogers <prog...@mapr.com> wrote: > >> Hi All, >> >> I’d like to propose a minor enhancement to the JSON reader to better >> handle non-relational JSON structures. (See DRILL-5974 [1].) >> >> As background, Drill handles simple tuples: >> >> {a: 10, b: “fred”} >> >> Drill also handles arrays: >> >> {name: “fred”, hobbies: [“bowling”, “golf”]} >> >> Drill even handles arrays of tuples: >> >> {name: “fred”, orders: [ >> {id: 1001, amount: 12.34}, >> {id: 1002, amount: 56.78}]} >> >> The above are termed "relational" because there is a straightforward >> mapping to/from tables into the above JSON structures. >> >> Things get interesting with non-relational types, such as 2-D arrays: >> >> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]} >> >> Drill has two solutions: >> >> * Turn on the experimental list and union support. >> * Enable all-text mode to read all fields as JSON text. >> >> Here, I’d like to propose a middle ground: >> >> * Read fields with relational types into vectors. >> * Read non-relational fields using text mode. >> >> Thus, the first three examples would all result in the JSON data parsed >> into Drill vectors. But, the fourth, non-relational example would produce a >> row that looks like this: >> >> id, shape, points >> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]” >> >> Although Drill can’t parse the 2-D array, Drill will pass the array along >> to the client, which can use its favorite JSON parser to parse the array >> and do something useful (like draw the square in this case.) >> >> In particular, the proposal: >> >> * Apply this change only to the revised “batch size aware” JSON reader. >> * Use the above parsing model by default. >> * Use the experimental list-and-union support if the existing >> `exec.enable_union_type` system/session option is set. >> >> Existing queries should “just work.” In fact, now JSON with non-relational >> types will work “out-of-the-box” without all-text mode or the experimental >> types. >> >> Thoughts? >> >> - Paul >> >> [1] https://issues.apache.org/jira/browse/DRILL-5974 >> >> >>