Sahil Takiar created IMPALA-9370:
------------------------------------

             Summary: Re-factor ImpalaServer, ClientRequestState, Coordinator 
protocol
                 Key: IMPALA-9370
                 URL: https://issues.apache.org/jira/browse/IMPALA-9370
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


All of these classes need to be updated to support transparent query retries, 
and each one could due with some re-factoring so that query retries don't make 
this code even more complex. For now, I'm going to list out some ideas / 
suggestions:
 * Rename ImpalaServer to ImpalaService, I think ImpalaServer is a bit of a 
misnomer because Impala isn't implementing its own server (it uses Thrift for 
that) instead it is providing a "service" to end users - this name is 
consistent with Thrift "service"s as well
 * Split up ClientRequestState - I'm not sure I fully understand what 
ClientRequestState is suppose to encapsulate - perhaps originally it captured 
the state of the actual client request as well as some helper code, but it 
seems to have evolved over time; it doesn't really look like a purely 
"stateful" object any more (e.g. it manages admission control submission)

One possible end state could be:

ImpalaService <–> QueryDriver (has a ClientRequestState that is not exposed 
externally) <–> QueryInstance <–> Coordinator

The QueryDriver is responsible for E2E execution of a query, including all 
stages such as parsing / planning of a query, submission to admission control, 
and backend execution. A QueryInstance is a single instance of a query, this is 
necessary for query retry support since a single query can be run multiple 
times. The Coordinator remains mostly the same - it is purely responsible for 
*backend* coordination / execution of a query.

This provides an opportunity to move a lot of the execution specific logic out 
of ImpalaServer and into QueryDriver. Currently, ImpalaServer is responsible 
for submitting the query to the fe/ and then passing the result to the 
ClientRequestState which submits it for admission control (and eventually the 
Coordinator for execution).

QueryDriver encapsulates the E2E execution of a query (starting from a query 
string, and then returning the results of a query) (inspired by Hive's IDriver 
interface - 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/IDriver.java]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to