Responding to the last comment since it is at the heart of the discussion: On Sun, Apr 8, 2018 at 3:20 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Inline > > On Sun, Apr 8, 2018 at 12:36 PM, Aman Sinha <amansi...@apache.org> wrote: > > > On Sun, Apr 8, 2018 at 10:57 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > .... > > > All of this, though, does not preclude the real need for the 'source of > > truth' of the schema for the cases where data has been already explored > and > > curated. We do want to have a solution for that core issue. > > > > Why is the answer different from a view with casts (that might be pushed > down)? > I agree that views with CASTs and pushdowns help a lot but they are limited to column names and data types primarily. I can think of a few reasons why they are not sufficient (not sure if these were mentioned by Paul or not): 1. Constraints such as UNIQUEness (for primary keys), partitioning keys, collation property etc are not expressible in views. These attributes are heavily used by the optimizer. 2. We want to support collecting statistics via something like ANALYZE TABLE command. This would operate on the base table rather than views since we want to collect histograms and number of distinct-values stats etc on the base data such that it can be shared among all queries, not necessarily those against specific views. 3. INSERT INTO TABLE may need to do schema checks and schema merging before inserting the data. These would not be possible with views. -Aman