I agree, we shouldn’t encourage implicit conversions. The model we are trying 
to support here is “schema on query”: the schema exists, that column J exists, 
and it is an integer. The user just didn’t bother to tell us about it until 
they were preparing a query.

There is a different model, which I’d call “schema on data”, which is what 
Drill does, and is more like the experience of programming JavaScript. In one 
record J might be an integer, the next record it might be absent, and the next 
record it might be a boolean. That model is compelling, but the EXTEND clause 
is not the way to approach it.

So, if a column defined in an EXTEND clause exists in the table (modulo the 
current session’s case-sensitivity[1]) and does not have the same type, I think 
the validator should throw an error.

Julian

[1] 
https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlNameMatcher.html
 
<https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlNameMatcher.html>
 

> On Mar 25, 2017, at 5:21 PM, Kevin Liew <[email protected]> wrote:
> 
> For the case that an extended column name collides with a column from the
> base table with differing data-types, the current implementation
> (CALCITE-1666, CALCITE-1702 WIP) is to validate the query successfully.
> 
> Phoenix does not allow an extended column to have the same name as a column
> from the base table, throwing:
> 
>    org.apache.phoenix.schema.ColumnAlreadyExistsException: ERROR 514
> (42892): A duplicate column name was detected in the object definition or
> ALTER TABLE statement. columnName=TABLE.K
> 
> A database might want to implement a physical plan that determines the
> schema on-read. For instance, Phoenix allows upserting the same extended
> column (with the same name in the same table on each upsert), but with
> differing data-type per upsert. When querying back this data, Phoenix casts
> all values of the extended column to the data-type that was specified in
> the query.
> 
>    ie.
> 
>        create table TEST (I integer primary key)
> 
>        upsert into TEST(J integer) values (1, 2)
> 
>        upsert into TEST(J boolean) values (1, true)
> 
>        select i, j from TEST(J boolean)
> 
>> 1, true; 1, true <
> 
> That is the behavior for differing data-types of extended columns of the
> same name. For consistency, this could also be applied to cases where an
> extended column collides with a column of the base table. But maybe we
> shouldn't encourage those implicit conversions done by redefining a base
> column as an extended column with a different data-type when a cast and
> alias could be used - so we should fail validation when an extended column
> has the same name as a base column?

Reply via email to