[jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client

anishek (JIRA) Thu, 26 Jan 2017 23:18:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842353#comment-15842353
 ]


anishek edited comment on HIVE-15473 at 1/27/17 7:16 AM:
---------------------------------------------------------

There are few observations / limitations that [~thejas] had cited while 
reviewing this. Writing down the reasoning here and steps of how we can move 
forward.

Given that we use SynchronizedHandler for the client on beeline side, only one 
operation / api at a time can be in execution from a single beeline session to 
hiveserver2. Current flow of how the progress bar is updated on the client side 
is 

Thread 1 -- does statement execution: This is achieved by calling 
GetOperationStatus for the operation from beeline till the execution of the 
operation is complete. The server side implementation of GetOperationStatus 
uses a timeout mechanism (which waits for the query execution to finish), 
before it sends the status to the client. The time value is decided by a step 
function, where for long running queries this can lead to a approx wait time of 
5 seconds per call to GetOperationStatus .
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means 
that only one api from either Thread 1 / Thread 2 is executed at at time and 
the notion of trying to project concurrent execution capability in code for 
beeline seems misleading and hence with the current patch the progress bar /  
query log updates can be delayed by at least 5+ seconds ( _I dont think we can 
avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization 
on a object is maintained, there is a possibility that Thread 1 can get the 
next lock on the object without Thread 2 getting a chance to obtain the lock, 
thus leading to long delays in updating the Query Log or Progress log ( _I am 
not sure how this will happen for use case of long running queries as while 
Thread 1 is executing , Thread 2 would already have blocked on the synchronize 
of the object. Once Thread 1 completes and before it comes around the while 
loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 
completes and executes additional statements and gets the lock again before 
Thread 2 gets a chance to acquire the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as 
no concurrency is supported by the Thrift protocol, unless we move to 
ThriftHttpCliService using Http based connection, or use NonBlockingThrift 
server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that 
effect should be avoided in beeline client. Hence, we strive to remove the 
multi threaded code from beeline side, in effect, moving the query log and 
progress bar log to merge with the GetOperationStatus api. This would still not 
address the issue of responsiveness as indicated in 1. above as the 
GetOperationStatus will use the wait time before responding to calls from 
beeline side, unless we decide to remove this, or reduce the wait time to a 
default value of say 500 milliseconds, not sure why the step function is used 
-- _to prevent server from wasting CPU resources on non-critical operations ?_ 
. This will address 2. above though since we are going to get all the 
information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of 
GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC 
compliant setters ( one interface for displaying progress bar, other for 
displaying query logs) -- default implementations for these will be _do 
nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to 
provide required implementations.
# As part of hive statement execute(*) call, we create appropriate request if 
custom implementations of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we 
might need to create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for 
progress bar display) and query progress as custom enum for decision making(and 
implementations on server side to map from execution engine based state to our 
generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in 
Problem Space* being a major impediment for hive usage, we should go with the 
new implementation proposal, else we just additionally implement *6. in 
Implementation Considerations*




was (Author: anishek):
There are few observations / limitations that [~thejas] had cited while 
reviewing this. Writing down the reasoning here and steps of how we can move 
forward.

Given that we use SynchronizedHandler for the client on beeline side, only one 
operation / api at a time can be in execution from a single beeline session to 
hiveserver2. Current flow of how the progress bar is updated on the client side 
is 

Thread 1 -- does statement execution: This is achieved by calling 
GetOperationStatus for the operation from beeline till the execution of the 
operation is complete. The server side implementation of GetOperationStatus 
uses a timeout mechanism (which waits for the query execution to finish), 
before it sends the status to the client. The time value is decided by a step 
function, where for long running queries this can lead to a approx wait time of 
5 seconds per call to GetOperationStatus .
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means 
that only one api from either Thread 1 / Thread 2 is executed at at time and 
the notion of trying to project concurrent execution capability in code for 
beeline seems misleading and hence with the current patch the progress bar /  
query log updates can be delayed by at least 5+ seconds ( _I dont think we can 
avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization 
on a object is maintained, there is a possibility that Thread 1 can get the 
next lock on the object without Thread 2 getting a chance to obtain the lock, 
thus leading to long delays in updating the Query Log or Progress log ( _I am 
not sure how this will happen for use case of long running queries as while 
Thread 1 is executing , Thread 2 would already have blocked on the synchronize 
of the object. Once Thread 1 completes and before it comes around the while 
loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 
completes and executes additional statements and gets the lock again before 
Thread 2 gets a chance to acquire the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as 
no concurrency is supported by the Thrift protocol, unless we move to 
ThriftHttpCliService using Http based connection, or use NonBlockingThrift 
server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that 
effect should be avoided in beeline client. Hence, we strive to remove the 
multi threaded code from beeline side, in effect, moving the query log and 
progress bar log to merge with the GetOperationStatus api. This would still not 
address the issue of responsiveness as indicated in 1. above as the 
GetOperationStatus will use the wait time before responding to calls from 
beeline side, unless we decide to remove this, or reduce the wait time to a 
default value of say 500 milliseconds, not sure why the step function is used 
-- _to prevent server from wasting CPU resources on non-critical operations ?_ 
. This will address 2. above though since we are going to get all the 
information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of 
GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC 
compliant setters ( one interface for displaying progress bar, other for 
displaying query logs) -- default implementations for these will be _do 
nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to 
provide required implementations.
# As part of hive statement execute(*) call, we create appropriate request if 
custom implementations of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we 
might need to create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for 
progress bar display) and query progress as custom enum for decision making(and 
implementations on server side to map from execution engine based state to our 
generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in 
Problem Space* being a major impediment for hive usage, we should go with the 
new implementation proposal else just additionally implement with *5. in 
Implementation Considerations*



> Progress Bar on Beeline client
> ------------------------------
>
>                 Key: HIVE-15473
>                 URL: https://issues.apache.org/jira/browse/HIVE-15473
>             Project: Hive
>          Issue Type: Improvement
>          Components: Beeline, HiveServer2
>    Affects Versions: 2.1.1
>            Reporter: anishek
>            Assignee: anishek
>            Priority: Minor
>         Attachments: HIVE-15473.2.patch, HIVE-15473.3.patch, 
> HIVE-15473.4.patch, HIVE-15473.5.patch, screen_shot_beeline.jpg
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client

Reply via email to