Hello,
I’m investigating a problem related to the JDBC storage plugin in Apache Drill 
which is utilizing Calcite for pushdown of SQL from the Drill query to the 
database, and I would like to get your view on the generated SQL:

I have a very simple SQL statement in Apache Drill joining two tables from the 
same JDBC database in Apache Drill and the push-down logic rewrites the Drill 
SQL statement to a JDBC statement and pushes it down to the database.

The statement is as follows: (against a DB2 database)

SELECT * FROM 
 gs_db2.GOSALESMR.PRODUCT_SURVEY_RESULTS results
 INNER JOIN
  gs_db2.GOSALESMR.PRODUCT_SURVEY_TOPIC topic
  ON results.PRODUCT_TOPIC_CODE = topic.PRODUCT_TOPIC_CODE

Which is converted to:
SELECT *
FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
"PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
"PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE"

Which is imho not correct since:
A) With databases with schemas, the fully qualified name to a table is 
schema.tablename, and in some databases even includes additional levels for 
instance username.schema.table. if I had a table with the same name in my 
default schema this query might even join the wrong table.
B) Even if it would be the complete path to the table, It is never a good idea 
imho to use the table name explicitly since the optimizer in the database might 
bring in N copies of the table instead of the one copy you intend to refer to 
(hence the use of aliases in SQL)
C) Not having aliases makes it really hard having multiple references to the 
same table in one SQL statement.
D) Not carrying over aliases from the calcite plan in Apache Drill to the 
generated JDBC statement must also mean that mapping of column references in 
the select clause becomes much harder since the column name is not including 
alias. (if it makes sense)

And in DB2 this statement generates an error due to table / column not found. 
With Aliases it works fine.

My questions are following:
Is it Drill that provides insufficient information to Apache Calcite, or is it 
Apache Calcite that does not consider aliases when generating the push-down 
query? Or is it trying to generate aliases and fails for some reason?

I think the most important question is if this behavior is intended, and if so 
the rationale behind it.
I’m trying to figure out if a bug should be/is opened and if so against which 
project.

Regards,
Magnus 




Reply via email to