[ 
https://issues.apache.org/jira/browse/IMPALA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971298#comment-16971298
 ] 

Quanlong Huang commented on IMPALA-9140:
----------------------------------------

I think it makes sense to have two pools. There are two kinds of load requests 
that we need to deal with them differently: async and sync.
 * Async load requests are for prioritized load or background load.
 * Sync load requests are from DDL/DML requests.

The main difference is that sync load request are issued and waited by the 
DDL/DML threads. These threads will add tables into catalog after they are 
loaded. For async load request, we need someone to wait and add the loaded 
table into catalog. That's the theads in threadPool of 
TableLoadingMgr.startTableLoadingThreads().

To be specifit, for AlterTable DDL, the code path is
{code:java}
CatalogOpExecutor.alterTable()
|-- CatalogOpExecutor.getExistingTable()
| |-- CatalogServiceCatalog.getOrLoadTable()
| |-- // Load the table if it's unloaded.
| |-- // Issue load request to tblLoadingPool_ in loadAsync()
| |-- TableLoadingMgr.loadAsync()
| |-- Add the loaded table into catalog
|-- // Continue AlterTable logics
{code}
For prioritizedLoad or backgroundLoad request, the code path is just adding the 
TableName to tableLoadingDeque_ if the table is not being loaded:
{code:java}
 public void backgroundLoad(TTableName tblName) {
   // Only queue for background loading if the table isn't already queued or
   // currently being loaded.
   if (tableLoadingBarrier_.putIfAbsent(tblName, new AtomicBoolean(false)) == 
null) {
     tableLoadingDeque_.offerLast(tblName);
   }
 }
{code}
Threads in the second pool, ThreadPool in 
TableLoadingMgr.startTableLoadingThreads(), will actually issue the load 
request and add the loaded table into catalog. This pool makes sure that at 
most numLoadingThreads_ async requests are running concurrently. It balances 
the async and sync load requests. In tblLoadingPool_, at most 
numLoadingThreads_ tasks are from the Async side. Other tasks are from the Sync 
side (DDL/DML) which has higher priorites.

> Get rid of the unnecessary load submitter thread pool in tblLoadingMgr
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9140
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9140
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>
> This JIRA is created as a followup on the discussion on 
> https://gerrit.cloudera.org/#/c/14611 related to various pools used for 
> loading tables.
> It looks like there are 2 pools of threads both of the size 
> {{num_metadata_loading_threads}}. One pool is used to submit the load 
> requests to another pool {{tblLoadingPool_}} which does the actual loading of 
> the tables. I think we can get rid of the pool which submits the tasks since 
> it is not very time-consuming operation and can be done synchronously (all it 
> needs to do submit the task in the queue in the front or back based on 
> whether its a prioritized load or background load). This will simplify the 
> loading code and  reduce unnecessary number of threads being created by 
> {{TblLoadingMgr}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to