[jira] [Comment Edited] (DRILL-5216) Make use of FetchSize in JDBC storage plugin

Bernd Lunghamer (JIRA) Fri, 15 Mar 2019 09:18:15 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793747#comment-16793747
 ]


Bernd Lunghamer edited comment on DRILL-5216 at 3/15/19 4:17 PM:
-----------------------------------------------------------------

JdbcStoragePlugin per default enumerates complete metadata of the database for 
every query executed. It recursivley pulls down metadata for
 * all catalogs
 * all schemata 

of the underlying JDBC connection. It does not honour which catalog and schema 
the user is connected to a connection level. 

That way we are experiencing queries that take 2 second in oracle without 
metadata discovery to take 3 minutes because all schema information is 
requested in every query. And we have a weak network link (roundtrip latencies).

I patched JdbcStorage Plugin with a configuration option to filter schemata and 
catalog to those of the underlying connection

so it pulls only metadata from getSchema() vs. getSchemas() as it is now . It 
evaluates only getCatalog() vs. getCatalog() as it is now.  This slashed query 
roundtrip times to accepatble level even without fixing the fetchSize.

Would there be any interest do merge this upstream?


was (Author: blunghamer):
 
JdbcStoragePlugin per default enumerates * all catalog
 * all schemata 

of the underlying JDBC connection no matter which catalog and schema the user 
is connected to a connection level. That way we are experiencing queries that 
take 2 second in oracle without metadata discovery on a connection level to 
take 3 minutes because all schema information is requested in every query. An 
we have a weak network link (roundtrip latencies).

I patched JdbcStorage Plugin with a configuration option to filter schemata and 
catalog to those of the underlying connection getSchema() vs. getSchemas(), 
getCatalog() vs. getCatalog(). Would there be any interest do merge this 
upstream?

> Make use of FetchSize in JDBC storage plugin
> --------------------------------------------
>
>                 Key: DRILL-5216
>                 URL: https://issues.apache.org/jira/browse/DRILL-5216
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JDBC
>    Affects Versions: 1.9.0
>         Environment: drill-embedded on ubuntu client - connected to a remote 
> Oracle
>            Reporter: Thomas Bünger
>            Priority: Minor
>
> For remotely located database servers it usually is preferred to increase the 
> fetch size for selected or all queries to limit number of roundtrips.
> This could either be controlled via plugin config or on a query basis via 
> comments.
> One has also to take metadata retrieval into account, as in larger scenarios 
> - as in mine - the Oracle cluster hosts thousands of schemas and the small 
> fetchsize results in hundres of individual roundtrips.
> In the end every Drill query against this storage takes at least a minute 
> justs for querying the metadata.
> Analysis so far for metadata-retrieving SQL calls:
> So far, Drill is using the JDBC metadata API 
> {{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and 
> could set an appropriate fetchsize before iterating the result set.
> I've tested this locally and improved latency a lot, but am note sure how 
> this affects other non-oracle JDBC drivers.
> The other (potentially long) query is the table enumeration.
> From what I've seen is Drill not calling the JDBC driver directly, but goes 
> through apache.calcite calling {{getTableNames()}} which under the hood calls 
> {{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow 
> metadata retrieval due to small default fetch size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-5216) Make use of FetchSize in JDBC storage plugin

Reply via email to