Re: Implicit columns and maps

Jinfeng Ni Fri, 27 Oct 2017 22:56:19 -0700

I think it would make sense to treat the implicit columns as reserved
words, and if user wants to use those names as regular column, or table
name, or any objects in the system, they should use quoted identifier.


select suffix from t1; // implicit columns takes precedence

select `suffix` from t1; // regular columns takes precedence.

That's what database would typically handles the conflicts between
keywords/reserved words vs identifier used for regular objects.

Regarding the change from dir0, dir1 to dir[0], dir[1], personally I do not
think it's a good idea, considering the impact on the application side (it
will break any application that built on Drill using those columns).
However, I agree that we should not add implicit columns in the record
reader, then remove them later, as reported in DRILL-5542 [1].

1. https://issues.apache.org/jira/browse/DRILL-5542


On Mon, Oct 9, 2017 at 2:38 PM, Paul Rogers <[email protected]> wrote:

> As you point out, naming is a separate issue. I believe we inherited names
> from some other system. But, it is an issue that we use “good” names for
> implicit columns. If we add more names “createDate”, “modifcationDate”,
> “owner”, or whatever), we end up breaking someone’s queries that has
> columns with those names.
>
> Would be good to have a prefix, as you suggest ,to separate Drill names
> from user names. As it turns out, Drill supports maps (AKA structs) and
> arrays, so perhaps we could have:
>
> $meta$
> |- filename
> |- fqn
> |- …
> |- dir[]
>
> Where “dir” is an array, rather than the current separate scalar dir0,
> dir1, etc.
>
> The above map let’s us add any number of metadata columns without
> potentially breaking existing queries.
>
> Note that we already have a problem where we can hit a hard schema change
> because one reader sees “/a/b/c/foo.csv” while another sees “a/b/bar.csv”,
> resulting in different numbers of “dirx” columns from the two readers.
>
> Yet another issue is that, in wildcard queries (e.g. “SELECT *”), we add
> all implicit columns, then remove them later. We should optimize this case.
>
> But, even if we keep the original names, and defer the other issues, the
> question about map semantics still stands…
>
> - Paul
>
> > On Oct 9, 2017, at 12:06 PM, Boaz Ben-Zvi <[email protected]> wrote:
> >
> >    How about changing all those “implicit” columns to have some
> “unconventional” prefix, like an underscore (or two _ _ ); e.g. _suffix,
> _dir0, etc .
> >
> > With such a change we may need to handle the transition of existing
> users’ code ; e.g., maybe change the priority (mentioned below) so that an
> existing “suffix” column takes precedence over the implicit one.
> > Or just go “cold turkey” and force the users to change.
> >
> >     Just an idea,
> >
> >            Boaz
> >
> > On 10/9/17, 10:45 AM, "Paul Rogers" <[email protected]> wrote:
> >
> >    Hi All,
> >
> >    Drill provides a set of “implicit” columns to describe files:
> filename, suffix, fan and filepath. Drill also provides an open-ended set
> of partition columns: dir0, dir1, dir2, etc.
> >
> >    Not all readers support the above: some do and some don’t.
> >
> >    Drill semantics seem to treat these as semi-reserved words when a
> reader supports implicit columns. If a table has a “suffix” column, then
> Drill will treat “suffix” as an implicit column, ignoring the table column.
> If the user wants that table column, they can use a session option to
> temporarily rename the implicit column. A bit odd, perhaps, but it is our
> solution.
> >
> >    What is our desired behavior, however, if the user asks for a column
> that includes an implicit column as a prefix: “suffix.a”? Clearly, here,
> “suffix” is a map (i.e. structure) and “a” is a field within that map.
> Since the implicit “suffix” is never a map, should we:
> >
> >    1) Assume that, here, “suffix” is a map column projected from the
> table?
> >    2) Issue an error?
> >    3) Ignore the “.a” part and just return “suffix” as an implicit
> column?
> >    4) Something else?
> >
> >    The code is murky on this point because JSON is implemented far
> differently than text files and so on. Each has its own rules. Do we need
> consistency of behavior, or is reader-specific behavior the expected design?
> >
> >    Thanks,
> >
> >    - Paul
> >
> >
> >
>
>

Re: Implicit columns and maps

Reply via email to