I agree, we shouldn’t encourage implicit conversions. The model we are trying to support here is “schema on query”: the schema exists, that column J exists, and it is an integer. The user just didn’t bother to tell us about it until they were preparing a query.
There is a different model, which I’d call “schema on data”, which is what Drill does, and is more like the experience of programming JavaScript. In one record J might be an integer, the next record it might be absent, and the next record it might be a boolean. That model is compelling, but the EXTEND clause is not the way to approach it. So, if a column defined in an EXTEND clause exists in the table (modulo the current session’s case-sensitivity[1]) and does not have the same type, I think the validator should throw an error. Julian [1] https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlNameMatcher.html <https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlNameMatcher.html> > On Mar 25, 2017, at 5:21 PM, Kevin Liew <[email protected]> wrote: > > For the case that an extended column name collides with a column from the > base table with differing data-types, the current implementation > (CALCITE-1666, CALCITE-1702 WIP) is to validate the query successfully. > > Phoenix does not allow an extended column to have the same name as a column > from the base table, throwing: > > org.apache.phoenix.schema.ColumnAlreadyExistsException: ERROR 514 > (42892): A duplicate column name was detected in the object definition or > ALTER TABLE statement. columnName=TABLE.K > > A database might want to implement a physical plan that determines the > schema on-read. For instance, Phoenix allows upserting the same extended > column (with the same name in the same table on each upsert), but with > differing data-type per upsert. When querying back this data, Phoenix casts > all values of the extended column to the data-type that was specified in > the query. > > ie. > > create table TEST (I integer primary key) > > upsert into TEST(J integer) values (1, 2) > > upsert into TEST(J boolean) values (1, true) > > select i, j from TEST(J boolean) > >> 1, true; 1, true < > > That is the behavior for differing data-types of extended columns of the > same name. For consistency, this could also be applied to cases where an > extended column collides with a column of the base table. But maybe we > shouldn't encourage those implicit conversions done by redefining a base > column as an extended column with a different data-type when a cast and > alias could be used - so we should fail validation when an extended column > has the same name as a base column?
