Subramanyam Pattipaka created HIVE-15947:
--------------------------------------------

             Summary: Enhance Templeton service job operations reliability
                 Key: HIVE-15947
                 URL: https://issues.apache.org/jira/browse/HIVE-15947
             Project: Hive
          Issue Type: Bug
            Reporter: Subramanyam Pattipaka
            Assignee: Subramanyam Pattipaka


Currently Templeton service doesn't restrict number of job operation requests. 
It simply accepts and tries to run all operations. If more number of concurrent 
job submit requests comes then the time to submit job operations can increase 
significantly. Templetonused hdfs to store staging file for job. If HDFS 
storage can't respond to large number of requests and throttles then the job 
submission can take very large times in order of minutes.

This behavior may not be suitable for all applications and client applications  
may be looking for predictable and low response for successful request or send 
throttle response to client to wait for some time before re-requesting job 
operation.

In this JIRA, I am trying to address following job operations 
1) Submit new Job
2) Get Job Status
3) List jobs

These three operations has different complexity due to variance in use of 
cluster resources like YARN/HDFS.

The idea is to introduce a new config templeton.job.submit.exec.max-procs which 
controls maximum number of concurrent active job submissions within Templeton 
and use this config to control better response times. If a new job submission 
request sees that there are already templeton.job.submit.exec.max-procs jobs 
getting submitted concurrently then the request will fail with Http error 503 
with reason 

   β€œToo many concurrent job submission requests received. Please wait for some 
time before retrying.”
 
The client is expected to catch this response and retry after waiting for some 
time. The default value for the config templeton.job.submit.exec.max-procs is 
set to β€˜0’. This means by default job submission requests are always accepted. 
The behavior needs to be enabled based on requirements.

We can have similar behavior for Status and List operations with configs 
templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to