[ 
https://issues.apache.org/jira/browse/IMPALA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873130#comment-17873130
 ] 

ASF subversion and git services commented on IMPALA-13294:
----------------------------------------------------------

Commit d8a8412c2b750937a3577b08d81ffd9a16269b83 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d8a8412c2 ]

IMPALA-13294: Add support for long polling to avoid client side wait

Currently, Impala does an execute call, then the client polls
waiting for the operation to finish (or error out). The client
sleeps between polls, and this sleep time can be a substantial
percentage of a short query's execution time.

To reduce this client side sleep, this implements long polling to
provide an option to wait for query completion on the server side.
This is controlled by the long_polling_time_ms query option. If
set to greater than zero, status RPCs will wait for query
completion for up to that amount of time. This defaults to off (0ms).

Both Beeswax and HS2 add a wait for query completion in their
get status calls (get_state for Beeswax, GetOperationStatus for HS2).
This doesn't wait in the execute RPC calls (e.g. query for Beeswax,
ExecuteStatement for HS2), because neither includes the query status
in the response. The client will always need to do a separate status
RPC.

This modifies impala-shell and the beeswax client to avoid doing a
sleep if the get_state/GetOperationStatus calls take longer than
they would have slept. In other words, if they would have slept 50ms,
then they skip that sleep if the RPC to the server took longer than
50ms. This allows the client to maintain its sleep behavior with
older Impalas that don't use long polling while adapting properly
to systems that do have long polling. This has the added benefit
that it also adjusts for high latency to the server as well. This
does not change any of the sleep times.

Testing:
 - This adds a test case in test_hs2.py to verify the long
   polling behavior

Change-Id: I72ca595c5dd8a33b936f078f7f7faa5b3f0f337d
Reviewed-on: http://gerrit.cloudera.org:8080/19205
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add option to use long polling for get_state/GetOperationStatus
> ---------------------------------------------------------------
>
>                 Key: IMPALA-13294
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13294
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend, Clients
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Currently, Impala does an execute call, then the client polls waiting for the 
> operation to finish (or error out). The client sleeps between polls, and this 
> sleep time can be a substantial percentage of a short query's execution time.
> Long polling allows status RPCs like HS2's GetOperationStatus and Beeswax's 
> get_state to wait for query completion on the server side for a configurable 
> amount of time. If the query completes during that time, the client can be 
> notified immediately. If the query does not complete, then the wait times out 
> and the client does a new status RPC.
> Clients can adjust to this functionality by keeping track of how much time 
> was spent in the status RPC, then only doing a client side sleep if the RPC 
> was shorter than the desired sleep time. This allows a client to maintain its 
> old behavior with old Impalas that don't have long polling while avoiding 
> unnecessary sleeps when using long polling.
> Hive has used long polling for a long time (see HIVE-5217).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to