[
https://issues.apache.org/jira/browse/IMPALA-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-11464.
-------------------------------------
Fix Version/s: Impala 4.2.0
Resolution: Fixed
> hasNext() throws FileNotFoundException on staging files and breaks file
> metadata loading
> ----------------------------------------------------------------------------------------
>
> Key: IMPALA-11464
> URL: https://issues.apache.org/jira/browse/IMPALA-11464
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.0.0, Impala 4.1.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Fix For: Impala 4.2.0
>
> Attachments: Selection_034.png, hdfs-touch-1002-files.sh
>
>
> HDFS provides a RemoteIterator for listing files. The two key interfaces are
> hasNext() and next(). We only handle the FileNotFoundException thrown in
> next(). However, hasNext() could also thrown FileNotFoundException for
> staging files:
> * The HDFS client will do a partial listing when creating the
> RemoteIterator. By default it only list 1000 items under the dir. The limit
> is configured by
> "[dfs.ls.limit|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.ls.limit]".
> * Usually the hasNext() call just consume the partial list and it's enough.
> However, if the dir contains more than 1000 items. The 1001-th hasNext() call
> will do a continue listing. This could hit FileNotFoundException if the dir
> was removed.
> Such a FileNotFoundException thrown from hasNext() will break the metadata
> loading:
> https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L948
> *Reproducing*
> To stably reproduce the issue, I use IntelliJ to stop catalogd at some point,
> and then manually delete the dir. Here are the detailed steps:
> Create a table and run a simple query on it to let catalogd cache its
> metadata.
> {code:sql}
> impala-shell> create table my_alltypes like functional.alltypes;
> impala-shell> desc my_alltypes;
> {code}
> Manually create staging dir with more than 1000 files in it.
> {code:bash}
> hdfs dfs -mkdir -p
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
> bash hdfs-touch-1002-files.sh
> {code}
> In IntelliJ, set a breakpoint at this line and attach to a running catalogd:
> https://github.com/apache/impala/blob/89c3e1f821ccd335c3c5507496bb53b80c1cc07a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L987
> In Hive, trigger an INSERT on this table.
> {code:sql}
> $ beeline -u "jdbc:hive2://localhost:11050"
> beeline> insert into my_alltypes partition(year=2009, month=4) select
> id,bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col,
> double_col, date_string_col, string_col, timestamp_col from
> functional.alltypes where year=2009 and month=4;
> {code}
> IntelliJ will prompt out for debugging. Step over until the partial listing
> is done on the staging dir "-ext-10001", as the screenshot shows.
> Manually remove the staging dir "-ext-10001"
> {code}
> hdfs dfs -rm -r -skipTrash
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
> Deleted
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
> {code}
> Then stop debugging to resume the process. I can see the following errors in
> catalogd. Event processor is broken and requires a global INVALIDATE METADATA
> to recover.
> {noformat}
> I0730 10:36:27.031431 9452 FileSystemUtil.java:837] Ignoring
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/996
> since it is either in a hidden directory or a temporary staging directory
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
> I0730 10:36:27.031471 9452 FileSystemUtil.java:837] Ignoring
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001/997
> since it is either in a hidden directory or a temporary staging directory
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1
> E0730 10:36:27.034335 9452 ParallelFileMetadataLoader.java:171] Loading file
> and block metadata for 1 paths for table default.my_alltypes encountered an
> error loading data for path
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4
> Java exception follows:
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
> does not exist.
> at
> com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
> at
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
> at
> com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
> at
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
> at
> org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
> at
> org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
> at
> org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
> at
> org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File
> hdfs://localhost:20500/test-warehouse/my_alltypes/year=2009/month=4/.hive-staging_hive_2022-07-30_09-34-05_894_3245867416204249845-1/-ext-10001
> does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:1317)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:1288)
> at
> org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:948)
> at
> org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:888)
> at
> org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:162)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> at
> com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> at
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
> at
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:162)
> ... 17 more
> E0730 10:36:27.034795 9452 MetastoreEventsProcessor.java:841] Event
> processing needs a invalidate command to resolve the state
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException:
> EventId: 728392 EventType: ADD_PARTITION Failed to refresh newly added
> partitions of table {}. Event processing cannot continue. Issue an invalidate
> metadata command to reset event processor.
> at
> org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1921)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:511)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:935)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:833)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.impala.catalog.TableLoadingException: Loading file and
> block metadata for 1 paths for table default.my_alltypes: failed to load 1
> paths. Check the catalog server log for more details.
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:185)
> at
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:780)
> at
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:743)
> at
> org.apache.impala.catalog.HdfsTable.createAndLoadPartitions(HdfsTable.java:904)
> at
> org.apache.impala.service.CatalogOpExecutor.addHdfsPartitions(CatalogOpExecutor.java:1465)
> at
> org.apache.impala.service.CatalogOpExecutor.addPartitionsIfNotRemovedLater(CatalogOpExecutor.java:4068)
> at
> org.apache.impala.catalog.events.MetastoreEvents$AddPartitionEvent.process(MetastoreEvents.java:1907)
> ... 10 more
> I0730 10:36:27.622951 9547 catalog-server.cc:400] A catalog update with 1
> entries is assembled. Catalog version: 1697 Last sent catalog version: 1696
> I0730 10:36:27.625458 9536 catalog-server.cc:813] Collected update:
> 1:CATALOG_SERVICE_ID, version=1698, original size=60, compressed size=58
> W0730 10:36:28.035472 9452 MetastoreEventsProcessor.java:826] Event
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id
> is 728391
> W0730 10:36:29.035904 9452 MetastoreEventsProcessor.java:826] Event
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id
> is 728391
> I0730 10:36:29.623289 9547 catalog-server.cc:400] A catalog update with 1
> entries is assembled. Catalog version: 1698 Last sent catalog version: 1697
> W0730 10:36:30.036418 9452 MetastoreEventsProcessor.java:826] Event
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id
> is 728391
> W0730 10:36:31.036957 9452 MetastoreEventsProcessor.java:826] Event
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id
> is 728391
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)