Hi Paul,
I have added some comments to DRILL-5974.  Perhaps better to discuss in the
JIRA for future referencing.

-Aman

On Sat, Nov 18, 2017 at 7:33 PM, Paul Rogers <prog...@mapr.com> wrote:

> Hi Arina,
>
> The proposal is to represent 2D arrays as a string (using the original,
> unparsed JSON.) That is, given this input:
>
> {a: “fred”, b: [[10, 20, 30], [11, 21, 31]]}
>
> The parsed columns are:
>
> a, b
> “fred”, "[[10, 20, 30], [11, 21, 31]]”
>
> Notice that column b is just a string. It is a string of JSON, yes, but
> still just a string.
>
> So, the question about kvgen/flatten does not apply here since we are not
> creating a Drill array.
>
> There is a very interesting discussion to be had about how Drill
> does/should handle “non-relational” JSON structures. But, here, the
> suggestions is just for one very simple special case.
>
> Thanks,
>
> - Paul
>
> > On Nov 18, 2017, at 7:15 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
> >
> > In general sounds good.
> > If user will apply kvgen / flatten over such 2-D array columns read as
> > string, he will be able to normalize data in the format he wants? Right?
> Or
> > we need to come up with new function?
> >
> > Kind regards
> > Arina
> >
> > On Fri, Nov 17, 2017 at 10:39 PM, Paul Rogers <prog...@mapr.com> wrote:
> >
> >> Hi All,
> >>
> >> I’d like to propose a minor enhancement to the JSON reader to better
> >> handle non-relational JSON structures. (See DRILL-5974 [1].)
> >>
> >> As background, Drill handles simple tuples:
> >>
> >> {a: 10, b: “fred”}
> >>
> >> Drill also handles arrays:
> >>
> >> {name: “fred”, hobbies: [“bowling”, “golf”]}
> >>
> >> Drill even handles arrays of tuples:
> >>
> >> {name: “fred”, orders: [
> >>  {id: 1001, amount: 12.34},
> >>  {id: 1002, amount: 56.78}]}
> >>
> >> The above are termed "relational" because there is a straightforward
> >> mapping to/from tables into the above JSON structures.
> >>
> >> Things get interesting with non-relational types, such as 2-D arrays:
> >>
> >> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
> >>
> >> Drill has two solutions:
> >>
> >> * Turn on the experimental list and union support.
> >> * Enable all-text mode to read all fields as JSON text.
> >>
> >> Here, I’d like to propose a middle ground:
> >>
> >> * Read fields with relational types into vectors.
> >> * Read non-relational fields using text mode.
> >>
> >> Thus, the first three examples would all result in the JSON data parsed
> >> into Drill vectors. But, the fourth, non-relational example would
> produce a
> >> row that looks like this:
> >>
> >> id, shape, points
> >> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
> >>
> >> Although Drill can’t parse the 2-D array, Drill will pass the array
> along
> >> to the client, which can use its favorite JSON parser to parse the array
> >> and do something useful (like draw the square in this case.)
> >>
> >> In particular, the proposal:
> >>
> >> * Apply this change only to the revised “batch size aware” JSON reader.
> >> * Use the above parsing model by default.
> >> * Use the experimental list-and-union support if the existing
> >> `exec.enable_union_type` system/session option is set.
> >>
> >> Existing queries should “just work.” In fact, now JSON with
> non-relational
> >> types will work “out-of-the-box” without all-text mode or the
> experimental
> >> types.
> >>
> >> Thoughts?
> >>
> >> - Paul
> >>
> >> [1] https://issues.apache.org/jira/browse/DRILL-5974
> >>
> >>
> >>
>
>

Reply via email to