Hello, I’m investigating a problem related to the JDBC storage plugin in Apache Drill which is utilizing Calcite for pushdown of SQL from the Drill query to the database, and I would like to get your view on the generated SQL:
I have a very simple SQL statement in Apache Drill joining two tables from the same JDBC database in Apache Drill and the push-down logic rewrites the Drill SQL statement to a JDBC statement and pushes it down to the database. The statement is as follows: (against a DB2 database) SELECT * FROM gs_db2.GOSALESMR.PRODUCT_SURVEY_RESULTS results INNER JOIN gs_db2.GOSALESMR.PRODUCT_SURVEY_TOPIC topic ON results.PRODUCT_TOPIC_CODE = topic.PRODUCT_TOPIC_CODE Which is converted to: SELECT * FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS" INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE" Which is imho not correct since: A) With databases with schemas, the fully qualified name to a table is schema.tablename, and in some databases even includes additional levels for instance username.schema.table. if I had a table with the same name in my default schema this query might even join the wrong table. B) Even if it would be the complete path to the table, It is never a good idea imho to use the table name explicitly since the optimizer in the database might bring in N copies of the table instead of the one copy you intend to refer to (hence the use of aliases in SQL) C) Not having aliases makes it really hard having multiple references to the same table in one SQL statement. D) Not carrying over aliases from the calcite plan in Apache Drill to the generated JDBC statement must also mean that mapping of column references in the select clause becomes much harder since the column name is not including alias. (if it makes sense) And in DB2 this statement generates an error due to table / column not found. With Aliases it works fine. My questions are following: Is it Drill that provides insufficient information to Apache Calcite, or is it Apache Calcite that does not consider aliases when generating the push-down query? Or is it trying to generate aliases and fails for some reason? I think the most important question is if this behavior is intended, and if so the rationale behind it. I’m trying to figure out if a bug should be/is opened and if so against which project. Regards, Magnus
