[
https://issues.apache.org/jira/browse/IMPALA-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078460#comment-18078460
]
Jason Fehr commented on IMPALA-14949:
-------------------------------------
I was able to replicate this issue:
1. Update CatalogServiceCatalog.java adding the following sleep at [line
2606|https://github.com/apache/impala/blob/042b915c9ec7feb0398bdec84027e908ada59725/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2606]:
{code:java}
try{
Thread.sleep(2000);
} catch (InterruptedException e) {
LOG.warn("Sleep was interrupted", e);
}
{code}
2. Ran 25 instances of this script passing `999999` as the first parameter:
{code:bash}
#!/bin/bash
ITERATIONS="${1:-1}"
for x in $(seq 1 "${ITERATIONS}"); do
./bin/impala-shell.sh -q "invalidate metadata
functional_parquet.widetable_1000_cols; refresh
functional_parquet.widetable_1000_cols;select * from
functional_parquet.widetable_1000_cols;"
done
{code}
3. Kill the catalogd process.
4. Start the catalogd process with this command:
{code:bash}
./be/build/latest/service/catalogd -logbufsecs=5 -v=1 -max_log_files=10
-log_rotation_match_pid=true -log_filename=catalogd
-log_dir=/home/impdev/impala/logs/cluster -kudu_master_hosts 127.0.0.1
--reset_metadata_lock_duration_ms=1 -catalog_service_port=26000
-state_store_subscriber_port=23020 -webserver_port=25020
--debug_actions=reset_metadata_loop_unlocked:SLEEP@250
{code}
> Catalogd Deadlock During Initial Invalidate
> -------------------------------------------
>
> Key: IMPALA-14949
> URL: https://issues.apache.org/jira/browse/IMPALA-14949
> Project: IMPALA
> Issue Type: Bug
> Reporter: Jason Fehr
> Assignee: Jason Fehr
> Priority: Critical
>
> A potential deadlock exists in catalogd during startup when the initial
> global invalidate all happens under heavy load (for example, when catalogd
> restarts after it was killed for running out of memory).
> Sequence of events:
> 1. Initial invalidate all takes too long and releases versionLock_
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2620].
> 2. Another catalogd operation (such as getPartialCatalogObject) takes a read
> lock on versionLock_ (for example,
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2808]).
> 3. The catalogd operationr then waits for the initial invalidate all to
> finish (for example,
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L523])
> 4. The Initial invalidate all cannot continue as it needs the write lock of
> versionLock_
> [here|https://github.com/apache/impala/blob/89b3307e377351e6920929dec75ee08bfc9f5f4a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2625].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]