[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant

Matt McCline (Jira) Thu, 10 Jun 2021 10:12:26 -0700

Matt McCline created HIVE-25237:
-----------------------------------

             Summary: Thrift CLI Service Protocol: Enhance HTTP variant
                 Key: HIVE-25237
                 URL: https://issues.apache.org/jira/browse/HIVE-25237
             Project: Hive
          Issue Type: Improvement
            Reporter: Matt McCline
            Assignee: Matt McCline



I have been thinking about the (Thrift) CLI Service protocol between the client 
and server.

Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
transport. HTTP is used when we go through a Gateway. The design for HTTP is 
stateless and different in nature than the direct BINARY TCP/IP connection. 
Which means today when we see that a Hive Server 2 response to a HTTP query 
request can be lost and that is part of the design... It is the WARNING we have 
seen when the Gateway drops its HTTP connection to Hive Server 2. We had been 
thinking this was a bug but it is by design.

I think the HTTP design needs a rethink.

When I worked for Tandem computers a long time ago messages were 
fault-tolerant. They used a message sequence #. When you send a message to a 
Tandem server it is a process pair. The message gets routed to the current 
process called the primary. The primary computes the message work and tells the 
backup process to remember the results before replying in case there is a 
failure. You can see where this goes -- if there is a failure before the client 
gets the result it retries and the backup process can resiliently give back the 
result the primary sent it. This isn't unique to Tandem -- without a 
process-pair -- this is a general resilient protocol.

In the HTTP design says message lost is possible both directions (request and 
response). I think we adopt a better scheme but not necessarily a process pair.

The first principle of rethink is the +_client_+ needs to generate a new 
operation num (an integer) that replaces the server-side generated random GUID. 
And the client generates a new msg num within its new operation. So beeline 
might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the 
client gets an OS connection kind of error, it retries with those (57, 1) 
numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets 
a message, there are 3 cases:

1) The sessionId GUID is not valid -- for now we reject the request because it 
is likely Hive Server 2 killed the session perhaps because it was restarted.

2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
monotonically.) Perform the request and save the response. And respond.

3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
respond with the saved result.

I think this message handling is in alignment with the HTTP stateless and any 
messages in-between can be lost philosophy. And it will shield the client from 
suffering a whole category of message failures that unnecessarily kill queries.

This also allows to not worry about which request is idempotent or not but 
instead requests are resilient.

---------------------

Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant

Reply via email to