[
https://issues.apache.org/jira/browse/IMPALA-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049057#comment-18049057
]
Quanlong Huang commented on IMPALA-13491:
-----------------------------------------
For 3, we already have a flag, num_metadata_loading_threads, to control the
parallism of [table loading
pool|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java#L148]
which serves the background load requests, prioritized load requests and sync
load requests (i.e. requests from
[getOrLoadTable()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2824]).
The parts that need to add the concurrency control are
*
[CatalogOpExecutor.loadTableMetadata()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1827]
invoked by DDL/DML threads to reload metadata of the table.
*
[CatalogServiceCatalog.reloadTable()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L3197]
invoked by REFRESH
So the Semaphore should control parallism of table loading threads + the above
unlimited threads from DDL/DMLs.
For 5, we need to define the timeout behavior. We can do it in a separate JIRA.
> Add config on catalogd for controlling the number of concurrent
> loading/refresh commands
> ----------------------------------------------------------------------------------------
>
> Key: IMPALA-13491
> URL: https://issues.apache.org/jira/browse/IMPALA-13491
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Manish Maheshwari
> Assignee: Arnab Karmakar
> Priority: Critical
>
> When running Table Loading or Refresh commands, catalogd requires working
> memory in proportion to the number of tables been refreshed. While we have a
> table level lock, we dont have a config to control concurrent load/refresh
> operations.
> In case of customers that run refresh in parallel in multiple threads, the
> number of load/refresh command can cause OOM on the catalog due to running
> out of working memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]