[
https://issues.apache.org/jira/browse/IMPALA-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972132#comment-16972132
]
Vihang Karajgaonkar commented on IMPALA-9140:
---------------------------------------------
{quote}In short, async load requests only have chance to run after all already
pending sync load requests finish in the above case (poolSize=1). This is what
I think how the second pool balances the sync and async loads.
{quote}
Thanks [~stigahuang]. I think what you said makes sense. So looks like the
background load threadpool controls the max throughput (number of async load
requests which can be run in parallel at a time) while sync load requests get
executed on-demand as soon as a thread becomes available.
There are probably easier ways to implement this. For example, if the desired
throughput of async requests is k requests in parallel, you can do this using a
single thread which polls the queue such that it can only submit {{K}} requests
at a time. Something like (I may have skipped some details and the have the
variable names incorrect):
{code:java}
ExecutorService backgroundTaskSubmitter = Executors.newSingleThreadExecutor();
backgroundTaskSubmitter.submit(new Runnable() {
void run() {
while(true) {
// submit the next batch of async requests from the queue
List<TableName> nextBatch = new ArrayList<>(K);
int numTasks = tableLoadingDeque_.drainTo(nextBatch, K);
List<FutureTask<Table>> futures = new ArrayList<>(numTasks);
for (int i=0; i<numTasks; i++) {
// submit each tableName to the tblLoading pool if the table is
not already being loaded
}
for (FutureTask f : futures) {
// wait for the tasks to complete
f.get();
}
}
}
});
{code}
May be this approach is not significantly better unless you guys think this
improves the code readability by a lot. Otherwise, we can just close this JIRA
as "Not a problem".
> Get rid of the unnecessary load submitter thread pool in tblLoadingMgr
> ----------------------------------------------------------------------
>
> Key: IMPALA-9140
> URL: https://issues.apache.org/jira/browse/IMPALA-9140
> Project: IMPALA
> Issue Type: Bug
> Reporter: Vihang Karajgaonkar
> Priority: Major
>
> This JIRA is created as a followup on the discussion on
> https://gerrit.cloudera.org/#/c/14611 related to various pools used for
> loading tables.
> It looks like there are 2 pools of threads both of the size
> {{num_metadata_loading_threads}}. One pool is used to submit the load
> requests to another pool {{tblLoadingPool_}} which does the actual loading of
> the tables. I think we can get rid of the pool which submits the tasks since
> it is not very time-consuming operation and can be done synchronously (all it
> needs to do submit the task in the queue in the front or back based on
> whether its a prioritized load or background load). This will simplify the
> loading code and reduce unnecessary number of threads being created by
> {{TblLoadingMgr}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]