Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Julian Hyde Wed, 11 May 2016 10:05:38 -0700

We originally generated an alias for every table in the FROM clause. But we 
discovered that the aliases were not necessary and made the query less easy to 
understand. For example, the query you mentioned earlier would become


SELECT *
FROM "GOSALESMR".”PRODUCT_SURVEY_RESULTS” AS T0
INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC” AS T1 ON T0."PRODUCT_TOPIC_CODE" 
= T1.”PRODUCT_TOPIC_CODE"

So, we add aliases only if the table aliases that are automatically generated 
by the target system’s SQL validator are not unique.

This is the code that assigns aliases only if they are necessary:

public Result result(SqlNode node, Collection<Clause> clauses, RelNode rel) {
  final String alias2 = SqlValidatorUtil.getAlias(node, -1);
  final String alias3 = alias2 != null ? alias2 : "t";
  final String alias4 =
      SqlValidatorUtil.uniquify(
          alias3, aliasSet, SqlValidatorUtil.EXPR_SUGGESTER);
  final String alias5 = alias2 == null || !alias2.equals(alias4) ? alias4
      : null;
  return new Result(node, clauses, alias5,
      Collections.singletonList(Pair.of(alias4, rel.getRowType())));
}

You assert that 

SELECT *
FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
"PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
"PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE”

is not correct, but I disagree. I think it is fine. My understanding is that 
the table "GOSALESMR".”PRODUCT_SURVEY_TOPIC” will implicitly have an alias 
“PRODUCT_SURVEY_TOPIC” on all SQL systems, including DB2. If my understanding 
is wrong, then we do indeed have a problem. Does that query fail to validate, 
or give wrong results?

Julian


> On May 11, 2016, at 7:43 AM, Magnus Pierre <[email protected]> wrote:
> 
> Hello,
> After some additional digging I believe I found the effect of the problem at 
> least. Apache Calcite does in fact generate aliases for all tables part of 
> the joins, and the aliases are used in the on clause of the join, however 
> since JdbcImplementor does not populate the neededAlias list in the 
> JdbcImplementor.Result constructor (aliases seems to be populated with the 
> relevant information though), the check in asFrom
> 
> public SqlNode asFrom() {
>    return (SqlNode)(this.neededAlias != 
> null?SqlStdOperatorTable.AS.createCall(JdbcImplementor.POS, new 
> SqlNode[]{this.node, new SqlIdentifier(this.neededAlias, 
> JdbcImplementor.POS)}):this.node);
> }
> 
> returns this.node rather than creating the as part which I assume 
> SqlStdOperatorTable.AS.createCall
> should take care of, causing the join statement to be converted in half.
> 
> Reasonable?
> 
> Why the neededAlias and aliases? Any difference between the two?
> 
> /Magnus
> 
> 
> 
> 
>> 11 maj 2016 kl. 10:41 skrev Magnus Pierre <[email protected]>:
>> 
>> Hello,
>> I’m investigating a problem related to the JDBC storage plugin in Apache 
>> Drill which is utilizing Calcite for pushdown of SQL from the Drill query to 
>> the database, and I would like to get your view on the generated SQL:
>> 
>> I have a very simple SQL statement in Apache Drill joining two tables from 
>> the same JDBC database in Apache Drill and the push-down logic rewrites the 
>> Drill SQL statement to a JDBC statement and pushes it down to the database.
>> 
>> The statement is as follows: (against a DB2 database)
>> 
>> SELECT * FROM 
>> gs_db2.GOSALESMR.PRODUCT_SURVEY_RESULTS results
>> INNER JOIN
>>  gs_db2.GOSALESMR.PRODUCT_SURVEY_TOPIC topic
>>  ON results.PRODUCT_TOPIC_CODE = topic.PRODUCT_TOPIC_CODE
>> 
>> Which is converted to:
>> SELECT *
>> FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
>> "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
>> "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE"
>> 
>> Which is imho not correct since:
>> A) With databases with schemas, the fully qualified name to a table is 
>> schema.tablename, and in some databases even includes additional levels for 
>> instance username.schema.table. if I had a table with the same name in my 
>> default schema this query might even join the wrong table.
>> B) Even if it would be the complete path to the table, It is never a good 
>> idea imho to use the table name explicitly since the optimizer in the 
>> database might bring in N copies of the table instead of the one copy you 
>> intend to refer to (hence the use of aliases in SQL)
>> C) Not having aliases makes it really hard having multiple references to the 
>> same table in one SQL statement.
>> D) Not carrying over aliases from the calcite plan in Apache Drill to the 
>> generated JDBC statement must also mean that mapping of column references in 
>> the select clause becomes much harder since the column name is not including 
>> alias. (if it makes sense)
>> 
>> And in DB2 this statement generates an error due to table / column not 
>> found. With Aliases it works fine.
>> 
>> My questions are following:
>> Is it Drill that provides insufficient information to Apache Calcite, or is 
>> it Apache Calcite that does not consider aliases when generating the 
>> push-down query? Or is it trying to generate aliases and fails for some 
>> reason?
>> 
>> I think the most important question is if this behavior is intended, and if 
>> so the rationale behind it.
>> I’m trying to figure out if a bug should be/is opened and if so against 
>> which project.
>> 
>> Regards,
>> Magnus 
>> 
>> 
>> 
>> 
>

Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Reply via email to