[ 
https://issues.apache.org/jira/browse/IMPALA-11509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581325#comment-17581325
 ] 

Gabor Kaszab commented on IMPALA-11509:
---------------------------------------

Note, https://issues.apache.org/jira/browse/IMPALA-11429 adds an extra "ALTER 
TABLE SET OWNER" step after table creation so once it's submitted there won't 
be needed an INSERT INTO step to repro this issue. Simply drop the files right 
after the CREATE TABLE came back in the shell.

> Dropping files of Iceberg during table loading may cause Impalad to stuck in 
> infinite loop
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11509
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11509
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.1.0
>            Reporter: Gabor Kaszab
>            Priority: Critical
>              Labels: iceberg, impala-iceberg
>
> This issues is very similar to 
> https://issues.apache.org/jira/browse/IMPALA-11502. The repro steps are also 
> almost identical, however in this case the folder of the table should be 
> dropped right when the INSERT into starts.
> Repro steps:
> 1) Create the Iceberg table:
> {code:java}
> DROP DATABASE IF EXISTS `drop_incomplete_table` CASCADE;
> CREATE DATABASE `drop_incomplete_table`;
> CREATE TABLE drop_incomplete_table.iceberg_tbl (i int) stored as iceberg
>     tblproperties('iceberg.catalog'='hadoop.catalog',
>                                 
> 'iceberg.catalog_location'='/test-warehouse/drop_incomplete_table');
> {code}
> 2) For this step timing is essential and might require a few try to hit the 
> issue. Try to run INSERT INTO and dropping the HDFS folder at the same time. 
> Manually executing them is fine, this doesn't require scripting.
> {code:java}
> INSERT INTO drop_incomplete_table.iceberg_tbl VALUES (1), (2), (3);
> hdfs dfs -rm -r hdfs://localhost:20500/test-warehouse/drop_incomplete_table
> {code}
> You will notice you hit the issue when Impala shell start to hang. The jstack 
> of the hanging impalad (not the catalogd) will contain this for one of the 
> threads:
> {code:java}
> "Thread-15" #30 prio=5 os_prio=0 tid=0x000000000db2a000 nid=0x56f4 in 
> Object.wait() [0x00007f0e7b59a000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.impala.catalog.ImpaladCatalog.waitForCatalogUpdate(ImpaladCatalog.java:290)
>       - locked <0x0000000724f7cdc0> (a java.lang.Object)
>       at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:229)
>       at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:141)
>       at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2001)
>       at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1913)
>       at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1737)
>       at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164)
> {code}
> Initially, Iceberg tables are created as IncompleteTables and when there is a 
> query on the table, they will be loaded as IcebergTable. For me it seems, 
> that when we run the first query after creating the table, with some timing 
> of dropping the files we can get into a state where the table appears as a 
> "missingTable" in StmtMetadataLoader.loadTable(), however, when a prioritized 
> table load is requested, the Catalog says that the table is already loaded.
> This results the table always appearing as "missingTable" and we never get 
> out of the [while 
> loop|https://github.com/apache/impala/blob/62e20d1ba842a3f27395251c57dea9850f462fc9/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java#L196]
>  in loadTables().
> I managed to repro this using HiveCatalog, but I didn't have luck to repro 
> with non-Iceberg, traditional Hive tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to