[jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

Magnus Pierre (JIRA) Tue, 15 Sep 2015 12:59:26 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746044#comment-14746044
 ]


Magnus Pierre commented on DRILL-3180:
--------------------------------------

Hello Jacques,
Sounds good. :)
Re the question:
I humbly disagree with the last sentence in your response. Having almost 15 
years in Enterprise Data Warehousing, one of the most common queries I came 
accross, or wrote myself were queries that dealt with time, and quite common 
were filtering conditions as part of the join clause.

Consider when joining n tables i.e. a much bigger query than the expressed and 
where you have history on most tables you are to join with, it is common to put 
the filter condition as part of the join since:

1) It makes the query more clearly expressed and readable where the conditions 
for the join is together with the join condition (most often a left outer join) 
where the filter is applied on the right hand table. 

2) It makes the query easier to maintain for the simple reason that you can 
comment out a block of code without touching multiple places.

3) Legacy SQL that could be supported provided we support filters as part of 
the join clause:
For some DB engines (Teradata to mention one), it is common to use it as part 
of join since it is more likely that the optimizer will be able to apply the 
filter before the join, at least on the ancient releases I worked with. (even 
though it should not matter from a query optimization perspective)
At the same token, derived tables are commonly used in some databases (TD as an 
example again) to ensure that a certain condition is processed before the join: 
Example: SELECT * from customer c inner join ( select s0.x,s0.y, s0.z from 
table_1 s0 where  s0.z < 100) as t1 on  c.cust_id = t1.x

Basically trying to circumvent certain limitations of query rewrite by 
explicitly expressing the processing order knowing that a large query is hard 
to untangle for most optimizers.

It should not matter for a mature optimizer, but for some it does.
So to conclude: It is important to support both cases since for some engines it 
will make a difference in effiiciency and processing order.
 



> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3180
>                 URL: https://issues.apache.org/jira/browse/DRILL-3180
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>    Affects Versions: 1.0.0
>            Reporter: Magnus Pierre
>            Assignee: Jacques Nadeau
>              Labels: Drill, JDBC, plugin
>             Fix For: 1.2.0
>
>         Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

Reply via email to