[ 
https://issues.apache.org/jira/browse/CALCITE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926383#comment-16926383
 ] 

Gabriel Reid commented on CALCITE-3333:
---------------------------------------

Thanks for taking a look [~julianhyde].

Surfacing it through the connect string sounds good. So there would be a few 
new URL parameters that could be added:
 * frame_rows_max
 * frame_millis_max
 * frame_bytes_max (although this one maybe wouldn't be implemented yet)

If any of these is provided in a connection URL, then frame limiting will be 
performed based on the first of these limits that is reached when constructing 
a single frame. Not providing these parameters leaves everything as-is.

About the limiting based on "time" – I think that there's a risk that I'll be 
misinterpreting the definitions of "time spent on the server" and "round-trip 
time", so I'll give a more concrete example. 

I've got an Calcite-based JDBC driver that sends retrieves data from various 
underlying REST interfaces, and this Calcite JDBC driver is hosted within 
Avatica. Some SQL queries that are sent to the Calcite driver only return rows 
at a very low rate (for example, joins that can't be pushed down to the backend 
REST services, or just basic scans that have a very low throughput). In these 
cases, it can take a long time (i.e. more than 30 seconds) to retrieve 100 rows 
for a single frame within Avatica server, which can then cause the HTTP 
connection between the Avatica thin driver and Avatica server to time out (due 
to networking infrastructure between these two things).

The "time" that I'm talking about here is the time spent within Avatica server 
while performing a single {{Meta.prepareAndExecute()}} or {{Meta.fetch()}} 
call. The RPC calls that I'm talking about are the same {{prepareAndExecute()}} 
and {{fetch()}} calls.

My understanding of your comment was that you were also talking about taking 
RPC calls to underlying services (i.e. the REST calls from the Calcite driver 
to the other underlying REST services in my example) into account, as there may 
be a fair bit of overhead involved in these calls. I see this as outside the 
scope of this specific ticket, as the details of these underlying RPC calls are 
not visible to Avatica server.

So concretely, I'd say that I'll add the {{frame_rows_max}} and 
{{frame_millis_max}} URL parameters and push these through to the underlying 
code. Does that sound good? Any comments on the naming of these parameters?

> Add time-based ResultSet frame size limiting
> --------------------------------------------
>
>                 Key: CALCITE-3333
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3333
>             Project: Calcite
>          Issue Type: New Feature
>          Components: avatica
>            Reporter: Gabriel Reid
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The size of a single JDBC ResultSet frame returned in a single 
> {{prepareAndExecute}} or {{fetch}} invocation is currently 100, meaning that 
> each retrieval of a portion of a ResultSet will send 100 rows over the wire. 
> This frame size may be too big in some situations, and too small in other 
> situations.
> If the underlying data source being queried can provide thousands of (small) 
> records per second, then only reading them at 100 per RPC call will be 
> unnecessarily slow.
> On the other hand, if the underlying data source is only providing records at 
> a rate of 1 per second, then it will take 100 seconds for each RPC call to 
> return, which can lead to timeouts (particularly if Avatica server is sitting 
> behind a proxy that has a strict request timeout).
> The main factors to take into account when finding an ideal size of frame to 
> return for each RPC call are:
> * make the frames small enough that they don't overload either Avatica server 
> or the client with overly large amounts of data at one time
> * make the frames large enough so that the percentage of total query time 
> that is spent only on RPC overhead is minimized
> The general idea of this ticket is to add a pluggable "frame size limiting" 
> functionality so that frame size limiting can be done based on the number of 
> rows, number of bytes, amount of time spent building a frame, or any other 
> property or combination of properties.
> Note that CALCITE-2322 contains some work to allow configuring the size of a 
> single frame on a Connection or Statement (via the {{setFetchSize}} method), 
> although it's not yet merged in. That ticket would also be useful, and does 
> not conflict with the general intent of this ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to