[ 
https://issues.apache.org/jira/browse/IMPALA-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049057#comment-18049057
 ] 

Quanlong Huang commented on IMPALA-13491:
-----------------------------------------

For 3, we already have a flag, num_metadata_loading_threads, to control the 
parallism of [table loading 
pool|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java#L148]
 which serves the background load requests, prioritized load requests and sync 
load requests (i.e. requests from 
[getOrLoadTable()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2824]).

The parts that need to add the concurrency control are
 * 
[CatalogOpExecutor.loadTableMetadata()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1827]
 invoked by DDL/DML threads to reload metadata of the table.
 * 
[CatalogServiceCatalog.reloadTable()|https://github.com/apache/impala/blob/52403541f2e11b6eeaaac849b2a3c739e80a6c2d/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L3197]
 invoked by REFRESH

So the Semaphore should control parallism of table loading threads + the above 
unlimited threads from DDL/DMLs.

For 5, we need to define the timeout behavior. We can do it in a separate JIRA.

> Add config on catalogd for controlling the number of concurrent 
> loading/refresh commands
> ----------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13491
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13491
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Manish Maheshwari
>            Assignee: Arnab Karmakar
>            Priority: Critical
>
> When running Table Loading or Refresh commands, catalogd requires working 
> memory in proportion to the number of tables been refreshed. While we have a 
> table level lock, we dont have a config to control concurrent load/refresh 
> operations.
> In case of customers that run refresh in parallel in multiple threads, the 
> number of load/refresh command can cause OOM on the catalog due to running 
> out of working memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to