[
https://issues.apache.org/jira/browse/HIVE-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-25237:
--------------------------------
Summary: Thrift CLI Service Protocol: Enhance HTTP variant to be more
resilient (was: Thrift CLI Service Protocol: Enhance HTTP variant)
> Thrift CLI Service Protocol: Enhance HTTP variant to be more resilient
> ----------------------------------------------------------------------
>
> Key: HIVE-25237
> URL: https://issues.apache.org/jira/browse/HIVE-25237
> Project: Hive
> Issue Type: Improvement
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Major
>
> I have been thinking about the (Thrift) CLI Service protocol between the
> client and server.
> Cloudera's Prashanth Jayachandran (private e-mail) told me that its original
> BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP
> transport. HTTP is used when we go through a Gateway. The design for HTTP is
> stateless and different in nature than the direct BINARY TCP/IP connection.
> Which means today when we see that a Hive Server 2 response to a HTTP query
> request can be lost and that is part of the design... It is the WARNING we
> have seen when the Gateway drops its HTTP connection to Hive Server 2. We had
> been thinking this was a bug but it is by design.
> I think the HTTP design needs a rethink.
> When I worked for Tandem computers a long time ago messages were
> fault-tolerant. They used a message sequence #. When you send a message to a
> Tandem server it is a process pair. The message gets routed to the current
> process called the primary. The primary computes the message work and tells
> the backup process to remember the results before replying in case there is a
> failure. You can see where this goes -- if there is a failure before the
> client gets the result it retries and the backup process can resiliently give
> back the result the primary sent it. This isn't unique to Tandem -- without a
> process-pair -- this is a general resilient protocol.
> In the HTTP design says message lost is possible both directions (request and
> response). I think we adopt a better scheme but not necessarily a process
> pair.
> The first principle of rethink is the +_client_+ needs to generate a new
> operation num (an integer) that replaces the server-side generated random
> GUID. And the client generates a new msg num within its new operation. So
> beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum =
> 1. If the client gets an OS connection kind of error, it retries with those
> (57, 1) numbers. Hive Server 2 will remember the last response. When Hive
> Server 2 gets a message, there are 3 cases:
> 1) The sessionId GUID is not valid -- for now we reject the request because
> it is likely Hive Server 2 killed the session perhaps because it was
> restarted.
> 2) The operationNum or operationMsgNum is new. (Assert the msg num increases
> monotonically.) Perform the request and save the response. And respond.
> 3) The (operationNum, operationMsgNum) matches the last request. Resiliently
> respond with the saved result.
> I think this message handling is in alignment with the HTTP stateless and any
> messages in-between can be lost philosophy. And it will shield the client
> from suffering a whole category of message failures that unnecessarily kill
> queries.
> This also allows to not worry about which request is idempotent or not but
> instead requests are resilient.
> ---------------------
> Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for
> idempotent and unsent http methods by prasanthj · Pull Request #1983 ·
> apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)