[ 
https://issues.apache.org/jira/browse/IMPALA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971922#comment-16971922
 ] 

Jiawei Wang commented on IMPALA-9140:
-------------------------------------

Thanks for the detailed explanation! I learned a lot from the comments.

Here is some of my thoughts on this:
{quote}In short, async load requests only have chance to run after all already 
pending sync load requests finish in the above case (poolSize=1). This is what 
I think how the second pool balances the sync and async loads.
{quote}
When the threads in _TableLoadingMgr.startTableLoadingThreads_ pick up a task 
from tableLoadingDeque_, it actually transform the async load request to sync 
load request. Because loadNextTable() is sync.

So the race actually happens between the pending sync loading request and the 
async loading requests from _numLoadingThreads_ of _startTableLoadingThreads._ 
For example, when _numLoadingThreads_=3, and things can happen when all 3 
threads in tblLoadingPool_ are busy doing some loading tasks. And then a sycn 
load request comes and waiting for being loaded. However, the 3 daemon threads 
in _TableLoadingMgr.startTableLoadingThreads_ can still have 3 pending 
async(which is actually sycn because the threads are blocking) load request to 
race with any true sycn jobs from DDL/DML.

But I do agree that the threads in _startTableLoadingThreads_ can provide some 
level of balance between sycn and asycn jobs.
{quote}I think maybe a PriorityQueue is a better fit. And the priority can be 
defined as sync load > prioritized load > background load. But it's different 
with current behavior. Async load requests may starve if there are always new 
sync load requests jumpping in. So I still think the current implementation 
makes some sense.
{quote}
I think the PriorityQueue solution actually makes sense here. The starve 
situation [~stigahuang] mentioned also exist in the current implementation. 
Because we can get a lot of new sync load requests and there is no guarantee 
when will an idle thread in _tblLoadingPool__ pick up an async task.

I only have limited experience of handling this kind level of multi threads.  
Please correct me if anything is wrong.

> Get rid of the unnecessary load submitter thread pool in tblLoadingMgr
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9140
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9140
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> This JIRA is created as a followup on the discussion on 
> https://gerrit.cloudera.org/#/c/14611 related to various pools used for 
> loading tables.
> It looks like there are 2 pools of threads both of the size 
> {{num_metadata_loading_threads}}. One pool is used to submit the load 
> requests to another pool {{tblLoadingPool_}} which does the actual loading of 
> the tables. I think we can get rid of the pool which submits the tasks since 
> it is not very time-consuming operation and can be done synchronously (all it 
> needs to do submit the task in the queue in the front or back based on 
> whether its a prioritized load or background load). This will simplify the 
> loading code and  reduce unnecessary number of threads being created by 
> {{TblLoadingMgr}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to