[ 
https://issues.apache.org/jira/browse/IMPALA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972132#comment-16972132
 ] 

Vihang Karajgaonkar commented on IMPALA-9140:
---------------------------------------------

{quote}In short, async load requests only have chance to run after all already 
pending sync load requests finish in the above case (poolSize=1). This is what 
I think how the second pool balances the sync and async loads.
{quote}
Thanks [~stigahuang]. I think what you said makes sense. So looks like the 
background load threadpool controls the max throughput (number of async load 
requests which can be run in parallel at a time) while sync load requests get 
executed on-demand as soon as a thread becomes available.

There are probably easier ways to implement this. For example, if the desired 
throughput of async requests is k requests in parallel, you can do this using a 
single thread which polls the queue such that it can only submit {{K}} requests 
at a time. Something like (I may have skipped some details and the have the 
variable names incorrect):
{code:java}
ExecutorService backgroundTaskSubmitter = Executors.newSingleThreadExecutor();
backgroundTaskSubmitter.submit(new Runnable() {
   void run() {
       while(true) {
           // submit the next batch of async requests from the queue
           List<TableName> nextBatch = new ArrayList<>(K);
           int numTasks = tableLoadingDeque_.drainTo(nextBatch, K);
           List<FutureTask<Table>> futures = new ArrayList<>(numTasks);
           for (int i=0; i<numTasks; i++) {
             // submit each tableName to the tblLoading pool if the table is 
not already being  loaded
           }
           for (FutureTask f : futures) {
             // wait for the tasks to complete
             f.get();
           }
       }
    }
});
{code}

May be this approach is not significantly better unless you guys think this 
improves the code readability by a lot. Otherwise, we can just close this JIRA 
as "Not a problem".

> Get rid of the unnecessary load submitter thread pool in tblLoadingMgr
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9140
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9140
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> This JIRA is created as a followup on the discussion on 
> https://gerrit.cloudera.org/#/c/14611 related to various pools used for 
> loading tables.
> It looks like there are 2 pools of threads both of the size 
> {{num_metadata_loading_threads}}. One pool is used to submit the load 
> requests to another pool {{tblLoadingPool_}} which does the actual loading of 
> the tables. I think we can get rid of the pool which submits the tasks since 
> it is not very time-consuming operation and can be done synchronously (all it 
> needs to do submit the task in the queue in the front or back based on 
> whether its a prioritized load or background load). This will simplify the 
> loading code and  reduce unnecessary number of threads being created by 
> {{TblLoadingMgr}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to