paul-rogers commented on PR #13196:
URL: https://github.com/apache/druid/pull/13196#issuecomment-1272466167

   @FrankChen021, you get the Fastest Review Ever award! Thanks!
   
   > I always thought the HTTP timeout is a client behavior, and didn't know 
that it can be controlled by the server.
   And I'm curious that is such internal feature implemented by Avatica JDBC 
driver only? Could you show me some materials to learn more about it?
   
   You are right: the client (and proxies and Router) all determine timeout: 
the shortest timeout wins. The challenge is when that HTTP timeout is shorter 
than the amount of time Druid needs to compute the next batch of results. In a 
regular REST API via the `/sql` endpoint, we have to compute and return all the 
results within this timeout. This is usually not a problem because Druid is 
intended for simple queries: those that return in 100ms with a few hundred or 
thousand rows, and that exploit Druid's time partitioning, filters, etc.
   
   The challenge is with the occasional "BI" style query that, for whatever 
reason, returns many rows, or takes a long time to compute. Maybe you've got 
billions of rows and need to find the one event, anywhere in that set, that 
represents access to some resource. There is nothing for it but to scan all the 
data, and that might take a while depending on the filter used. Or, maybe 
someone wants to grab several million rows of data to feed into an ML model. 
And so on.
   
   Normally you would not want to run such a query on Druid: that;'s not what 
Druid is designed for. But, sometimes you just gotta do what you gotta do. In 
this case, if your query takes a minute to run, and one of the network elements 
in the chain has a 30 second timeout, you can't successfully run the query with 
request response.
   
   But, JDBC is different. It works by sending multiple requests for each 
query, each returning some number of rows, generally in the thousands. The 
client keeps asking for more batches until the server returns EOF. So, if we 
had a long running query, but each batch of records could be computed in less 
time than the HTTP timeout, the query would succeed.
   
   Most of the time in a big query, however, is the work before we get the 
first row. So, we want to generalize the above idea. JDBC allows us to return a 
batch of 0 rows. If so, the client just turns around and requests another batch 
until it gets rows or is told that an EOF occurred. Just to be clear, this 
polling is a feature of the Avatica client. The application just iterates 
though a `ResultSet`, blissfully ignorant of the goings-on under the covers.
   
   That's what the async thing in this PR does for us. We tell Druid to go get 
a batch of rows. If this takes longer than the *fetch timeout*, we return an 
empty batch, and remember that Druid is busily working away to get us a batch. 
Maybe it will be ready next time. If, so, we return it. If not, we again return 
an empty batch and the whole thing repeats.
   
   For this to work, the fetch timeout has to be less than the HTTP timeout. 
Given jitter, we might use the Nyquist rule and set the fetch timeout to half 
of the HTTP timeout. It doesn't have to be exact, any value less than 1/2 the 
network timeout should be fine.
   
   Voila! We've decoupled query run time from network timeouts by running the 
query (or at least each batch fetch) asynchronously with the Avatica REST HTTP 
requests. So, you see, we don't set the HTTP timeout, we just work around it by 
*always* returning within the timeout, knowing that the client will do the 
right thing if we return 0 rows to stay within the timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to