Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Julian Hyde Thu, 12 May 2016 21:28:37 -0700

No dispute about aliases — they are a good idea, and we should use them, and in 
fact we do use them.


But every database I’ve come across (with the exception of DB2) has IMPLICIT 
aliases which are equivalent to the un-qualified table name. For example, in 
Oracle,

  select emp.empno, emp.deptno from sales.emp

is equivalent to 

  select emp.empno, emp.deptno from sales.emp as emp

This assumption has worked for many databases so I’d rather treat DB2 as the 
exception. I believe our approach is basically sound so I don’t want to turn 
everything upside down. The queries we generate are reasonably concise, and 
because they use aliases based on table names they are easier for humans to 
understand. If our approach is flawed, you will surely be able to break things 
and log bugs with test cases.

If someone were to add DB2 to https://github.com/vlsi/calcite-test-dataset 
<https://github.com/vlsi/calcite-test-dataset> then we could do much better 
testing. One test case is worth a dozen email messages.

Julian


> On May 12, 2016, at 12:56 AM, Magnus Pierre <[email protected]> wrote:
> 
> Understand. Just joined… :)
> 
> Just a comment on your reasoning around aliases. 
> 
> Aliases are the safest, and most practical route for all databases, and 
> leaves no room to the database to do crazy interpretations of what you try to 
> achieve. I fully understand that aliases are not for everyone, but being 
> system generated SQL that will go against databases and not humans, the use 
> of aliases will save query length (provided the aliases are shorter than 
> table names), allow for returning duplications of column names since the name 
> of the column will be alias.column-name, it will avoid cross join behavior 
> commonly seen in for instance Teradata when mixing aliases and non-aliases.
> 
> Not to mention that it could simplify the processing done by Calcite, and 
> hopefully also remove a set of error conditions for specific databases (like 
> DB2) where things are more strict than in others.
> 
> Aliases has been required to be supported if I remember correctly since 
> SQL-1999 so it should be a fairly safe feature to use for almost all JDBC 
> databases.
> I have never seen automatic generation of aliases for table names as 
> described. (But I must admit that for the last 15 or so years I’ve used 
> aliases at all cases so that does not tell much) 
> 
> Another aspect is that unqualified table references need to be evaluated by 
> the dbms against the current namespace (schema) that the user uses. If I have 
> a table in my currently used namespace (schema) with the same name as the 
> ”automatic alias” in a query, how will the database know if I am referring to 
> my table in the current namespace which is unqualified, or an in query alias 
> for the table in the from? 
> 
> Of course these are just my observations and I have complete trust in you 
> guys knowing far,far more than me about database internals and how a database 
> interprets the SQL language. I’m just a user even though I used to write some 
> dbms engine code (object relational things) back in the days.
> 
> I’ll open a JIRA late today, and will do the testing requested.
> 
> Regards,
> Magnus
> 
> --------
> You should join the list so that you get replies, and so I don’t have to 
> moderate each of
> your questions through.
> 
> Well, I’ve learned something about DB2 SQL. I’d never heard of a database 
> generating composite
> table aliases. We should fix the adapter so we generate the correct SQL for 
> DB2. The fix is
> probably to make the result() method’s behavior depend on 
> dialect.getDatabaseProduct().
> Can you log a JIRA case please?
> 
> Since you have access to DB2, could you also run Calcite’s whole test suite 
> against DB2?
> Let’s find out what other issues are there. Add a “DB2" entry to enum 
> CalciteAssert.DatabaseInstance
> and run with “-Dcalcite.test.db=DB2”.
> 
> Julian
> 
> 
> 
>> On May 11, 2016, at 1:50 PM, Magnus Pierre <[email protected]> wrote:
>> 
>> Hello Julian,
>> 
>> Not very convenient to not be part of the mailing list when asking 
>> questions, good that
> the calcite-dev archives list responses. :) 
>> 
>> But here is my claim:
>> 
>> The query:
>> 
>> SELECT *
>> FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
>> "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE"
>> = "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE”
>> 
>> 
>> Using squirrel returns: (as well as with Drill)
>> Error: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, 
>> SQLERRMC=PRODUCT_SURVEY_RESULTS.PRODUCT_TOPIC_CODE,
> DRIVER=4.13.127
>> SQLState:  42703
>> ErrorCode: -206
>> Error: DB2 SQL Error: SQLCODE=-727, SQLSTATE=56098, 
>> SQLERRMC=2;-206;42703;PRODUCT_SURVEY_RESULTS.PRODUCT_TOPIC_CODE,
> DRIVER=4.13.127
>> SQLState:  56098
>> ErrorCode: -727
>> 
>> 
>> 
>> In Drill:
>> 
>> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: The 
>> JDBC storage
> plugin failed while trying setup the SQL query. sql SELECT * FROM 
> "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
> "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE"
> = "PRODUCT_SURVEY_TOPIC"."PRODUCT_TOPIC_CODE" plugin gs_db2 Fragment 0:0 
> [Error Id: 9e5c6edc-8572-4de2-b9ad-dce8a74589b1
> on mpie-mbp.lan:31010] (com.ibm.db2.jcc.am.SqlSyntaxErrorException) DB2 SQL 
> Error: SQLCODE=-206,
> SQLSTATE=42703, SQLERRMC=PRODUCT_SURVEY_RESULTS.PRODUCT_TOPIC_CODE, 
> DRIVER=4.13.127 com.ibm.db2.jcc.am.id.a():677
> com.ibm.db2.jcc.am.id.a():60 com.ibm.db2.jcc.am.id.a():127 
> com.ibm.db2.jcc.am.no.c():2653
> com.ibm.db2.jcc.am.no.d():2641 com.ibm.db2.jcc.am.no.a():2090 
> com.ibm.db2.jcc.t4.cb.h():141
> com.ibm.db2.jcc.t4.cb.b():41 com.ibm.db2.jcc.t4.q.a():32 
> com.ibm.db2.jcc.t4.sb.i():135 com.ibm.db2.jcc.am.no.ib():2059
> com.ibm.db2.jcc.am.no.a():3130 com.ibm.db2.jcc.am.no.a():688 
> com.ibm.db2.jcc.am.no.executeQuery():672
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177 
> org.apache.drill.exec.physical.impl.ScanBatch.():108
> org.apache.drill.exec.physical.impl.ScanBatch.():136 
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():148
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():128
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():231
> org.apache.drill.common.SelfCleaningRunnable.run():38 
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615 
> java.lang.Thread.run():745
>> 
>> 
>> With aliases in Squirrel:
>> 
>> SELECT * FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS" AS 
>> "PRODUCT_SURVEY_RESULTS" 
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" AS "PRODUCT_SURVEY_TOPIC" 
>> ON "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
>> "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE"
>> 
>> Returns all data with no errors or warnings.
>> 
>> SELECT * FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS" AS PRODUCT_SURVEY_RESULTS
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" AS PRODUCT_SURVEY_TOPIC 
>> ON "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE" = 
>> "PRODUCT_SURVEY_TOPIC"."PRODUCT_TOPIC_CODE"
>> 
>> Returns all data with no errors or warnings.
>> 
>> I distinctly remember this being the same behavior for both Netezza and 
>> Teradata for
> instance. (even though it was a few years since I really did lots of SQL on 
> these systems).
> I can test Netezza but have no access to a TD box currently. 
>> 
>> Regards,
>> Magnus
>> 
>> —————————————
>> 
>> This is a response to:
>> —————————————
>> 
>> We originally generated an alias for every table in the FROM clause. But we 
>> discovered
> that
>> the aliases were not necessary and made the query less easy to understand. 
>> For example,
> the
>> query you mentioned earlier would become
>> 
>> SELECT *
>> FROM "GOSALESMR".”PRODUCT_SURVEY_RESULTS” AS T0
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC” AS T1 ON 
>> T0."PRODUCT_TOPIC_CODE" = T1.”PRODUCT_TOPIC_CODE"
>> 
>> So, we add aliases only if the table aliases that are automatically 
>> generated by the
> target
>> system’s SQL validator are not unique.
>> 
>> This is the code that assigns aliases only if they are necessary:
>> 
>> public Result result(SqlNode node, Collection<Clause> clauses, RelNode rel) {
>> final String alias2 = SqlValidatorUtil.getAlias(node, -1);
>> final String alias3 = alias2 != null ? alias2 : "t";
>> final String alias4 =
>>     SqlValidatorUtil.uniquify(
>>         alias3, aliasSet, SqlValidatorUtil.EXPR_SUGGESTER);
>> final String alias5 = alias2 == null || !alias2.equals(alias4) ? alias4
>>     : null;
>> return new Result(node, clauses, alias5,
>>     Collections.singletonList(Pair.of(alias4, rel.getRowType())));
>> }
>> 
>> You assert that 
>> 
>> SELECT *
>> FROM "GOSALESMR"."PRODUCT_SURVEY_RESULTS"
>> INNER JOIN "GOSALESMR"."PRODUCT_SURVEY_TOPIC" ON 
>> "PRODUCT_SURVEY_RESULTS"."PRODUCT_TOPIC_CODE"
>> = "PRODUCT_SURVEY_TOPIC".”PRODUCT_TOPIC_CODE”
>> 
>> is not correct, but I disagree. I think it is fine. My understanding is that 
>> the table
> "GOSALESMR".”PRODUCT_SURVEY_TOPIC”
>> will implicitly have an alias “PRODUCT_SURVEY_TOPIC” on all SQL systems, 
>> including
> DB2.
>> If my understanding is wrong, then we do indeed have a problem. Does that 
>> query fail
> to validate,
>> or give wrong results?
>> 
>> Julian
>> 
>> 
>> 
>

Re: Problem with Calcite generated JDBC pushdown SQL in Apache Drill

Reply via email to