Impala 4.5 Iceberg table on Tencent Cloud COS fails with "failed to load 1 paths" during metadata refresh.

汲广熙 Mon, 05 Jan 2026 04:35:21 -0800

Subject:&nbsp;Re: [Bug?] Impala 4.5 Iceberg table loading failure on Tencent 
Cloud COS (cosn://)


Hi,

Thank you for the suggestions. I have some critical updates regarding the 
TableLoadingException: failed to load 1 paths&nbsp;issue on Tencent Cloud COS 
(cosn://).

I have already enabled Local Catalog 
Mode&nbsp;(--use_local_catalog=true&nbsp;and minimal&nbsp;topic mode), but the 
issue persists.

Based on an analysis provided by Gemini (Google's AI model)&nbsp;regarding the 
stack trace, it was suggested that the error in 
ParallelFileMetadataLoader.loadInternal&nbsp;might be caused by concurrency 
issues when handling object storage paths. Following this logic, I tried to 
apply a workaround by setting metadata_loader_parallelism=1&nbsp;via ALTER 
TABLE.

However, the command failed with the same AnalysisException&nbsp;and 
TableLoadingException. It seems Impala falls into a deadlock state: I cannot 
modify the table properties to fix the loading logic because Impala fails to 
load the table metadata even during the ALTER TABLE&nbsp;analysis phase.

Observations:


The error originates from ParallelFileMetadataLoader.loadInternal.



Even with Local Catalog enabled, the loader seems unable to handle the missing 
block location info from the cosn://&nbsp;driver.



Standard Parquet tables work fine; this appears specific to the 
Iceberg&nbsp;metadata loading path in Impala 4.5.


Question: Is there a way to globally set metadata_loader_parallelism=1&nbsp;for 
the entire Catalogd, or perhaps a flag to force Impala to ignore block location 
errors for object storage during the initial load?

I am happy to provide a full stack trace or open a JIRA ticket, as this 
currently makes Iceberg on COS unusable in our environment.




============================================================================================================
When I run a simple SELECT *or INSERT INTOquery on this table, I get the 
following error:
AnalysisException: Failed to load metadata for table: 
'iceberg_cos_employee_test'
...
IcebergTableLoadingException: Error loading metadata for Iceberg table 
cosn://cdnlogtest-1252412955/impala_test_db/iceberg_cos_employee_test
...
Loading file and block metadata for 1 paths for table ...: failed to load 1 
paths.
============================================================================================================


Best regards,
         原始邮件
         
       
发件人：Zoltán Borók-Nagy <[email protected]&gt;
发件时间：2025年12月23日 23:52
收件人：dev <[email protected]&gt;
主题：Re: Advice and Considerations for Building an Impala 
Compute-StorageSeparated Architecture



       Hi,

Just&nbsp;a&nbsp;few&nbsp;tips&nbsp;off&nbsp;the&nbsp;top&nbsp;of&nbsp;my&nbsp;head:
&nbsp;-&nbsp;use&nbsp;dedicated&nbsp;coordinators&nbsp;and&nbsp;executors,&nbsp;rule&nbsp;of&nbsp;thumb&nbsp;for
coordinator:executor&nbsp;ratio&nbsp;is&nbsp;1:50.&nbsp;Though&nbsp;for&nbsp;HA&nbsp;you&nbsp;probably&nbsp;want&nbsp;&gt;1
coordinators.
&nbsp;-&nbsp;use&nbsp;local&nbsp;catalog&nbsp;mode&nbsp;(aka&nbsp;On-demand&nbsp;metadata):
https://impala.apache.org/docs/build/html/topics/impala_metadata.html
&nbsp;-&nbsp;enabling&nbsp;remote&nbsp;data&nbsp;cache&nbsp;(with&nbsp;SSD&nbsp;disks)&nbsp;is&nbsp;essential&nbsp;in
compute-storage&nbsp;separated&nbsp;setup:
https://impala.apache.org/docs/build/html/topics/impala_data_cache.html

What&nbsp;table&nbsp;format&nbsp;/&nbsp;file&nbsp;format&nbsp;are&nbsp;you&nbsp;planning&nbsp;to&nbsp;use?
If&nbsp;table&nbsp;format&nbsp;is&nbsp;Iceberg,&nbsp;make&nbsp;sure&nbsp;you&nbsp;use&nbsp;the&nbsp;latest&nbsp;Impala&nbsp;as&nbsp;we
continuously&nbsp;improve&nbsp;Impala's&nbsp;performance&nbsp;on&nbsp;Iceberg.
File&nbsp;format:&nbsp;Impala&nbsp;most&nbsp;efficiently&nbsp;works&nbsp;on&nbsp;Parquet&nbsp;files.

Avoid&nbsp;small&nbsp;file issues:
&nbsp;-&nbsp;choose&nbsp;proper&nbsp;partitioning&nbsp;for&nbsp;your&nbsp;data,&nbsp;i.e.&nbsp;avoid&nbsp;too&nbsp;coarse-grained
and&nbsp;too&nbsp;fine-grained&nbsp;partitioning.&nbsp;I.e.&nbsp;you&nbsp;probably&nbsp;want&nbsp;more&nbsp;than&nbsp;200&nbsp;MB
data&nbsp;per&nbsp;partition,&nbsp;but&nbsp;probably&nbsp;less&nbsp;than&nbsp;20&nbsp;GB.
&nbsp;-&nbsp;compact&nbsp;your&nbsp;tables&nbsp;regularly,&nbsp;for&nbsp;Iceberg&nbsp;tables&nbsp;Impala&nbsp;has&nbsp;the
OPTIMIZE&nbsp;statement:
https://impala.apache.org/docs/build/html/topics/impala_iceberg.html

I&nbsp;hope&nbsp;others&nbsp;chime&nbsp;in&nbsp;as&nbsp;well.

We&nbsp;would&nbsp;love&nbsp;to&nbsp;hear&nbsp;back&nbsp;about&nbsp;your&nbsp;experiences,&nbsp;and&nbsp;feel&nbsp;free&nbsp;to&nbsp;open
tickets&nbsp;for&nbsp;Impala&nbsp;if&nbsp;you&nbsp;run&nbsp;into&nbsp;any&nbsp;issue:
https://issues.apache.org/jira/projects/IMPALA/issues

Cheers,
&nbsp;&nbsp;&nbsp;&nbsp;Zoltan

On&nbsp;Tue,&nbsp;Dec&nbsp;23,&nbsp;2025&nbsp;at&nbsp;8:23 
AM&nbsp;汲广熙&nbsp;<[email protected]&gt;&nbsp;wrote:

&gt;&nbsp;Dear&nbsp;Impala&nbsp;Team,
&gt;
&gt;&nbsp;I&nbsp;hope&nbsp;this&nbsp;message&nbsp;finds&nbsp;you&nbsp;well.
&gt;
&gt;&nbsp;I&nbsp;am&nbsp;currently&nbsp;planning&nbsp;to&nbsp;build&nbsp;a&nbsp;compute-storage&nbsp;separated&nbsp;architecture
&gt;&nbsp;based&nbsp;on&nbsp;Apache&nbsp;Impala.&nbsp;In&nbsp;this&nbsp;setup:
&gt;
&gt;
&gt;&nbsp;Compute&nbsp;layer:&nbsp;Apache&nbsp;Impala&nbsp;will&nbsp;be&nbsp;used&nbsp;for&nbsp;SQL&nbsp;query&nbsp;execution.
&gt;
&gt;
&gt;
&gt;&nbsp;Storage&nbsp;layer:&nbsp;Tencent&nbsp;Cloud&nbsp;Object&nbsp;Storage&nbsp;(COS)&nbsp;will&nbsp;serve&nbsp;as&nbsp;the
&gt;&nbsp;backend&nbsp;storage.
&gt;
&gt;
&gt;
&gt;&nbsp;Data&nbsp;ingestion:&nbsp;Kafka&nbsp;will&nbsp;be&nbsp;used&nbsp;to&nbsp;stream&nbsp;data&nbsp;into&nbsp;the&nbsp;system.
&gt;
&gt;
&gt;
&gt;&nbsp;Monitoring&nbsp;&amp;amp;&nbsp;visualization:&nbsp;Grafana&nbsp;will&nbsp;be&nbsp;used&nbsp;to&nbsp;display
&gt;&nbsp;operational&nbsp;and&nbsp;performance&nbsp;metrics.
&gt;
&gt;
&gt;&nbsp;Could&nbsp;you&nbsp;please&nbsp;provide&nbsp;some&nbsp;recommendations&nbsp;and&nbsp;key&nbsp;considerations&nbsp;for
&gt;&nbsp;such&nbsp;an&nbsp;architecture?&nbsp;Specifically,&nbsp;I&nbsp;would&nbsp;appreciate&nbsp;guidance&nbsp;on:
&gt;
&gt;
&gt;&nbsp;Best&nbsp;practices&nbsp;for&nbsp;integrating&nbsp;Impala&nbsp;with&nbsp;cloud&nbsp;object&nbsp;storage&nbsp;(COS).
&gt;
&gt;
&gt;
&gt;&nbsp;Performance&nbsp;tuning&nbsp;tips&nbsp;for&nbsp;Impala&nbsp;in&nbsp;a&nbsp;disaggregated&nbsp;environment.
&gt;
&gt;
&gt;
&gt;&nbsp;Any&nbsp;known&nbsp;limitations&nbsp;or&nbsp;compatibility&nbsp;issues&nbsp;when&nbsp;using&nbsp;COS&nbsp;as&nbsp;storage.
&gt;
&gt;
&gt;
&gt;&nbsp;Recommended&nbsp;configurations&nbsp;for&nbsp;Kafka-to-Impala&nbsp;data&nbsp;pipelines.
&gt;
&gt;
&gt;
&gt;&nbsp;Monitoring&nbsp;strategies&nbsp;for&nbsp;tracking&nbsp;query&nbsp;performance&nbsp;and&nbsp;resource&nbsp;usage&nbsp;in
&gt;&nbsp;Grafana.
&gt;
&gt;
&gt;&nbsp;Thank&nbsp;you&nbsp;very&nbsp;much&nbsp;for&nbsp;your&nbsp;support&nbsp;and&nbsp;advice.&nbsp;I&nbsp;look&nbsp;forward&nbsp;to&nbsp;your
&gt;&nbsp;reply.
&gt;
&gt;&nbsp;Best&nbsp;regards

Impala 4.5 Iceberg table on Tencent Cloud COS fails with "failed to load 1 paths" during metadata refresh.

Reply via email to