[ 
https://issues.apache.org/jira/browse/IMPALA-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078460#comment-18078460
 ] 

Jason Fehr commented on IMPALA-14949:
-------------------------------------

I was able to replicate this issue:
1. Update CatalogServiceCatalog.java adding the following sleep at [line 
2606|https://github.com/apache/impala/blob/042b915c9ec7feb0398bdec84027e908ada59725/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2606]:
{code:java}
try{
  Thread.sleep(2000);
} catch (InterruptedException e) {
  LOG.warn("Sleep was interrupted", e);
}
{code}
2. Ran 25 instances of this script passing `999999` as the first parameter:

{code:bash}
#!/bin/bash

ITERATIONS="${1:-1}"
for x in $(seq 1 "${ITERATIONS}"); do
  ./bin/impala-shell.sh -q "invalidate metadata 
functional_parquet.widetable_1000_cols; refresh 
functional_parquet.widetable_1000_cols;select * from 
functional_parquet.widetable_1000_cols;"
done
{code}
3. Kill the catalogd process.
4. Start the catalogd process with this command:
{code:bash}
./be/build/latest/service/catalogd -logbufsecs=5 -v=1 -max_log_files=10 
-log_rotation_match_pid=true -log_filename=catalogd 
-log_dir=/home/impdev/impala/logs/cluster -kudu_master_hosts 127.0.0.1 
--reset_metadata_lock_duration_ms=1 -catalog_service_port=26000 
-state_store_subscriber_port=23020 -webserver_port=25020 
--debug_actions=reset_metadata_loop_unlocked:SLEEP@250
{code}




> Catalogd Deadlock During Initial Invalidate
> -------------------------------------------
>
>                 Key: IMPALA-14949
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14949
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Jason Fehr
>            Assignee: Jason Fehr
>            Priority: Critical
>
> A potential deadlock exists in catalogd during startup when the initial 
> global invalidate all happens under heavy load (for example, when catalogd 
> restarts after it was killed for running out of memory).
> Sequence of events:
> 1. Initial invalidate all takes too long and releases versionLock_ 
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2620].
> 2. Another catalogd operation (such as getPartialCatalogObject) takes a read 
> lock on versionLock_ (for example, 
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2808]).
> 3. The catalogd operationr then waits for the initial invalidate all to 
> finish (for example, 
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L523])
> 4. The Initial invalidate all cannot continue as it needs the write lock of 
> versionLock_ 
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2625].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to