[jira] [Updated] (DRILL-5216) Make use of FetchSize in JDBC storage plugin

JIRA Tue, 24 Jan 2017 20:03:44 -0800

     [ 
https://issues.apache.org/jira/browse/DRILL-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Bünger updated DRILL-5216:
---------------------------------
    Description: 
For remotely located database servers it usually is preferred to increase the 
fetch size for selected or all queries to limit number of roundtrips.

This could either be controlled via plugin config or on a query basis via 
comments.

One has also to take metadata retrieval into account, as in larger scenarios - 
as in mine - the Oracle cluster hosts thousands of schemas and the small 
fetchsize results in hundres of individual roundtrips.
In the end every Drill query against this storage takes at least a minute justs 
for querying the metadata.

Analysis so far for metadata-retrieving SQL calls:
So far, Drill is using the JDBC metadata API 
{{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and 
could set an appropriate fetchsize before iterating the result set.
I've tested this locally and improved latency a lot, but am note sure how this 
affects other non-oracle JDBC drivers.

The other (potentially long) query is the table enumeration.
>From what I've seen is Drill not calling the JDBC driver directly, but goes 
>through apache.calcite calling {{getTableNames()}} which under the hood calls 
>{{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow 
>metadata retrieval due to small default fetch size.

  was:
The metadata retrieval uses the default fetchsize for the underlying JDBC 
driver, which in case of Oracle is only 10.
In larger scenarios - as in mine - the Oracle cluster hosts thousands of 
schemas and the small fetchsize results in hundres of individual roundtrips.
In the end every Drill query against this storage takes at least a minute 
(server is remote)

So far, Drill is using the JDBC metadata API 
{{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and 
could set an appropriate fetchsize before iterating the result set.
I've tested this locally and improved latency a lot, but am note sure how this 
affects other non-oracle JDBC drivers.

The other (potentially long) query is the table enumeration.
>From what I've seen is Drill not calling the JDBC driver directly, but goes 
>through apache.calcite calling {{getTableNames()}} which under the hood calls 
>{{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow 
>metadata retrieval due to small default fetch size.


> Make use of FetchSize in JDBC storage plugin
> --------------------------------------------
>
>                 Key: DRILL-5216
>                 URL: https://issues.apache.org/jira/browse/DRILL-5216
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JDBC
>    Affects Versions: 1.9.0
>         Environment: drill-embedded on ubuntu client - connected to a remote 
> Oracle
>            Reporter: Thomas Bünger
>            Priority: Minor
>
> For remotely located database servers it usually is preferred to increase the 
> fetch size for selected or all queries to limit number of roundtrips.
> This could either be controlled via plugin config or on a query basis via 
> comments.
> One has also to take metadata retrieval into account, as in larger scenarios 
> - as in mine - the Oracle cluster hosts thousands of schemas and the small 
> fetchsize results in hundres of individual roundtrips.
> In the end every Drill query against this storage takes at least a minute 
> justs for querying the metadata.
> Analysis so far for metadata-retrieving SQL calls:
> So far, Drill is using the JDBC metadata API 
> {{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and 
> could set an appropriate fetchsize before iterating the result set.
> I've tested this locally and improved latency a lot, but am note sure how 
> this affects other non-oracle JDBC drivers.
> The other (potentially long) query is the table enumeration.
> From what I've seen is Drill not calling the JDBC driver directly, but goes 
> through apache.calcite calling {{getTableNames()}} which under the hood calls 
> {{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow 
> metadata retrieval due to small default fetch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5216) Make use of FetchSize in JDBC storage plugin

Reply via email to