[
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746044#comment-14746044
]
Magnus Pierre commented on DRILL-3180:
--------------------------------------
Hello Jacques,
Sounds good. :)
Re the question:
I humbly disagree with the last sentence in your response. Having almost 15
years in Enterprise Data Warehousing, one of the most common queries I came
accross, or wrote myself were queries that dealt with time, and quite common
were filtering conditions as part of the join clause.
Consider when joining n tables i.e. a much bigger query than the expressed and
where you have history on most tables you are to join with, it is common to put
the filter condition as part of the join since:
1) It makes the query more clearly expressed and readable where the conditions
for the join is together with the join condition (most often a left outer join)
where the filter is applied on the right hand table.
2) It makes the query easier to maintain for the simple reason that you can
comment out a block of code without touching multiple places.
3) Legacy SQL that could be supported provided we support filters as part of
the join clause:
For some DB engines (Teradata to mention one), it is common to use it as part
of join since it is more likely that the optimizer will be able to apply the
filter before the join, at least on the ancient releases I worked with. (even
though it should not matter from a query optimization perspective)
At the same token, derived tables are commonly used in some databases (TD as an
example again) to ensure that a certain condition is processed before the join:
Example: SELECT * from customer c inner join ( select s0.x,s0.y, s0.z from
table_1 s0 where s0.z < 100) as t1 on c.cust_id = t1.x
Basically trying to circumvent certain limitations of query rewrite by
explicitly expressing the processing order knowing that a large query is hard
to untangle for most optimizers.
It should not matter for a mature optimizer, but for some it does.
So to conclude: It is important to support both cases since for some engines it
will make a difference in effiiciency and processing order.
> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and
> Netezza from Apache Drill
> ---------------------------------------------------------------------------------------------------
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Affects Versions: 1.0.0
> Reporter: Magnus Pierre
> Assignee: Jacques Nadeau
> Labels: Drill, JDBC, plugin
> Fix For: 1.2.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
> Original Estimate: 1m
> Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill.
> The code is primitive but consitutes a good starting point for further
> coding. Today it provides primitive support for SELECT against RDBMS with
> JDBC.
> The goal is to provide complete SELECT support against RDBMS with push down
> capabilities.
> Currently the code is using standard JDBC classes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)