Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Magnus Pierre Wed, 11 May 2016 07:48:06 -0700

Hello,
After some additional digging I believe I found the effect of the problem at 
least. Apache Calcite does in fact generate aliases for all tables part of the 
joins, and the aliases are used in the on clause of the join, however since 
JdbcImplementor does not populate the neededAlias list in the 
JdbcImplementor.Result constructor (aliases seems to be populated with the 
relevant information though), the check in asFrom


public SqlNode asFrom() {
    return (SqlNode)(this.neededAlias != 
null?SqlStdOperatorTable.AS.createCall(JdbcImplementor.POS, new 
SqlNode[]{this.node, new SqlIdentifier(this.neededAlias, 
JdbcImplementor.POS)}):this.node);
}

returns this.node rather than creating the as part which I assume 
SqlStdOperatorTable.AS.createCall
should take care of, causing the join statement to be converted in half.

Reasonable?

Why the neededAlias and aliases? Any difference between the two?

/Magnus




> 11 maj 2016 kl. 10:41 skrev Magnus Pierre <[email protected]>:
> 
> Hello,
> I’m investigating a problem related to the JDBC storage plugin in Apache 
> Drill which is utilizing Calcite for pushdown of SQL from the Drill query to 
> the database, and I would like to get your view on the generated SQL:
> 
> I have a very simple SQL statement in Apache Drill joining two tables from 
> the same JDBC database in Apache Drill and the push-down logic rewrites the 
> Drill SQL statement to a JDBC statement and pushes it down to the database.
> 
> The statement is as follows: (against a DB2 database)
> 
> SELECT * FROM 
>  gs_db2.GOSALESMR.PRODUCT_SURVEY_RESULTS results
>  INNER JOIN
>   gs_db2.GOSALESMR.PRODUCT_SURVEY_TOPIC topic
>   ON results.PRODUCT_TOPIC_CODE = topic.PRODUCT_TOPIC_CODE
> 
> Which is converted to:
> SELECT *
> FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
> "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
> "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE"
> 
> Which is imho not correct since:
> A) With databases with schemas, the fully qualified name to a table is 
> schema.tablename, and in some databases even includes additional levels for 
> instance username.schema.table. if I had a table with the same name in my 
> default schema this query might even join the wrong table.
> B) Even if it would be the complete path to the table, It is never a good 
> idea imho to use the table name explicitly since the optimizer in the 
> database might bring in N copies of the table instead of the one copy you 
> intend to refer to (hence the use of aliases in SQL)
> C) Not having aliases makes it really hard having multiple references to the 
> same table in one SQL statement.
> D) Not carrying over aliases from the calcite plan in Apache Drill to the 
> generated JDBC statement must also mean that mapping of column references in 
> the select clause becomes much harder since the column name is not including 
> alias. (if it makes sense)
> 
> And in DB2 this statement generates an error due to table / column not found. 
> With Aliases it works fine.
> 
> My questions are following:
> Is it Drill that provides insufficient information to Apache Calcite, or is 
> it Apache Calcite that does not consider aliases when generating the 
> push-down query? Or is it trying to generate aliases and fails for some 
> reason?
> 
> I think the most important question is if this behavior is intended, and if 
> so the rationale behind it.
> I’m trying to figure out if a bug should be/is opened and if so against which 
> project.
> 
> Regards,
> Magnus 
> 
> 
> 
>

Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Reply via email to