Re: "Death of Schema-on-Read"

Aman Sinha Sun, 08 Apr 2018 20:39:32 -0700

Responding to the last comment since it is at the heart of the discussion:

On Sun, Apr 8, 2018 at 3:20 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Inline
>
> On Sun, Apr 8, 2018 at 12:36 PM, Aman Sinha <amansi...@apache.org> wrote:
>
> > On Sun, Apr 8, 2018 at 10:57 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> ....

>
> > All of this, though, does not preclude the real need for the 'source of
> > truth' of the schema for the cases where data has been already explored
> and
> > curated. We do want to have a solution for that core issue.
> >
>
> Why is the answer different from a view with casts (that might be pushed
> down)?
>

 I agree that views with CASTs and pushdowns help a lot but they are
limited to column names and data types primarily.
I can think of a few reasons why they are not sufficient (not sure if these
were mentioned by Paul or not):

1. Constraints such as UNIQUEness (for primary keys),  partitioning keys,
collation property etc are not expressible in views.
    These attributes are heavily used by the optimizer.

2.  We want to support collecting statistics via something like ANALYZE
TABLE command.  This would operate on the
  base table rather than views since we want to collect histograms and
number of distinct-values stats etc on the base data
  such that it can be shared among all queries, not necessarily those
against specific views.

3.   INSERT INTO TABLE may need to do schema checks and schema merging
before inserting the data.  These would not
   be possible with views.

-Aman

Re: "Death of Schema-on-Read"

Reply via email to