Subject: Re: [Bug?] Impala 4.5 Iceberg table loading failure on Tencent 
Cloud COS (cosn://)

Hi,

Thank you for the suggestions. I have some critical updates regarding the 
TableLoadingException: failed to load 1 paths issue on Tencent Cloud COS 
(cosn://).

I have already enabled Local Catalog 
Mode (--use_local_catalog=true and minimal topic mode), but the 
issue persists.

Based on an analysis provided by Gemini (Google's AI model) regarding the 
stack trace, it was suggested that the error in 
ParallelFileMetadataLoader.loadInternal might be caused by concurrency 
issues when handling object storage paths. Following this logic, I tried to 
apply a workaround by setting metadata_loader_parallelism=1 via ALTER 
TABLE.

However, the command failed with the same AnalysisException and 
TableLoadingException. It seems Impala falls into a deadlock state: I cannot 
modify the table properties to fix the loading logic because Impala fails to 
load the table metadata even during the ALTER TABLE analysis phase.

Observations:


The error originates from ParallelFileMetadataLoader.loadInternal.



Even with Local Catalog enabled, the loader seems unable to handle the missing 
block location info from the cosn:// driver.



Standard Parquet tables work fine; this appears specific to the 
Iceberg metadata loading path in Impala 4.5.


Question: Is there a way to globally set metadata_loader_parallelism=1 for 
the entire Catalogd, or perhaps a flag to force Impala to ignore block location 
errors for object storage during the initial load?

I am happy to provide a full stack trace or open a JIRA ticket, as this 
currently makes Iceberg on COS unusable in our environment.




============================================================================================================
When I run a simple SELECT *or INSERT INTOquery on this table, I get the 
following error:
AnalysisException: Failed to load metadata for table: 
'iceberg_cos_employee_test'
...
IcebergTableLoadingException: Error loading metadata for Iceberg table 
cosn://cdnlogtest-1252412955/impala_test_db/iceberg_cos_employee_test
...
Loading file and block metadata for 1 paths for table ...: failed to load 1 
paths.
============================================================================================================


Best regards,
         原始邮件
         
       
发件人:Zoltán Borók-Nagy <[email protected]&gt;
发件时间:2025年12月23日 23:52
收件人:dev <[email protected]&gt;
主题:Re: Advice and Considerations for Building an Impala 
Compute-StorageSeparated Architecture



       Hi,

Just&nbsp;a&nbsp;few&nbsp;tips&nbsp;off&nbsp;the&nbsp;top&nbsp;of&nbsp;my&nbsp;head:
&nbsp;-&nbsp;use&nbsp;dedicated&nbsp;coordinators&nbsp;and&nbsp;executors,&nbsp;rule&nbsp;of&nbsp;thumb&nbsp;for
coordinator:executor&nbsp;ratio&nbsp;is&nbsp;1:50.&nbsp;Though&nbsp;for&nbsp;HA&nbsp;you&nbsp;probably&nbsp;want&nbsp;&gt;1
coordinators.
&nbsp;-&nbsp;use&nbsp;local&nbsp;catalog&nbsp;mode&nbsp;(aka&nbsp;On-demand&nbsp;metadata):
https://impala.apache.org/docs/build/html/topics/impala_metadata.html
&nbsp;-&nbsp;enabling&nbsp;remote&nbsp;data&nbsp;cache&nbsp;(with&nbsp;SSD&nbsp;disks)&nbsp;is&nbsp;essential&nbsp;in
compute-storage&nbsp;separated&nbsp;setup:
https://impala.apache.org/docs/build/html/topics/impala_data_cache.html

What&nbsp;table&nbsp;format&nbsp;/&nbsp;file&nbsp;format&nbsp;are&nbsp;you&nbsp;planning&nbsp;to&nbsp;use?
If&nbsp;table&nbsp;format&nbsp;is&nbsp;Iceberg,&nbsp;make&nbsp;sure&nbsp;you&nbsp;use&nbsp;the&nbsp;latest&nbsp;Impala&nbsp;as&nbsp;we
continuously&nbsp;improve&nbsp;Impala's&nbsp;performance&nbsp;on&nbsp;Iceberg.
File&nbsp;format:&nbsp;Impala&nbsp;most&nbsp;efficiently&nbsp;works&nbsp;on&nbsp;Parquet&nbsp;files.

Avoid&nbsp;small&nbsp;file issues:
&nbsp;-&nbsp;choose&nbsp;proper&nbsp;partitioning&nbsp;for&nbsp;your&nbsp;data,&nbsp;i.e.&nbsp;avoid&nbsp;too&nbsp;coarse-grained
and&nbsp;too&nbsp;fine-grained&nbsp;partitioning.&nbsp;I.e.&nbsp;you&nbsp;probably&nbsp;want&nbsp;more&nbsp;than&nbsp;200&nbsp;MB
data&nbsp;per&nbsp;partition,&nbsp;but&nbsp;probably&nbsp;less&nbsp;than&nbsp;20&nbsp;GB.
&nbsp;-&nbsp;compact&nbsp;your&nbsp;tables&nbsp;regularly,&nbsp;for&nbsp;Iceberg&nbsp;tables&nbsp;Impala&nbsp;has&nbsp;the
OPTIMIZE&nbsp;statement:
https://impala.apache.org/docs/build/html/topics/impala_iceberg.html

I&nbsp;hope&nbsp;others&nbsp;chime&nbsp;in&nbsp;as&nbsp;well.

We&nbsp;would&nbsp;love&nbsp;to&nbsp;hear&nbsp;back&nbsp;about&nbsp;your&nbsp;experiences,&nbsp;and&nbsp;feel&nbsp;free&nbsp;to&nbsp;open
tickets&nbsp;for&nbsp;Impala&nbsp;if&nbsp;you&nbsp;run&nbsp;into&nbsp;any&nbsp;issue:
https://issues.apache.org/jira/projects/IMPALA/issues

Cheers,
&nbsp;&nbsp;&nbsp;&nbsp;Zoltan

On&nbsp;Tue,&nbsp;Dec&nbsp;23,&nbsp;2025&nbsp;at&nbsp;8:23 
AM&nbsp;汲广熙&nbsp;<[email protected]&gt;&nbsp;wrote:

&gt;&nbsp;Dear&nbsp;Impala&nbsp;Team,
&gt;
&gt;&nbsp;I&nbsp;hope&nbsp;this&nbsp;message&nbsp;finds&nbsp;you&nbsp;well.
&gt;
&gt;&nbsp;I&nbsp;am&nbsp;currently&nbsp;planning&nbsp;to&nbsp;build&nbsp;a&nbsp;compute-storage&nbsp;separated&nbsp;architecture​
&gt;&nbsp;based&nbsp;on&nbsp;Apache&nbsp;Impala.&nbsp;In&nbsp;this&nbsp;setup:
&gt;
&gt;
&gt;&nbsp;Compute&nbsp;layer:&nbsp;Apache&nbsp;Impala&nbsp;will&nbsp;be&nbsp;used&nbsp;for&nbsp;SQL&nbsp;query&nbsp;execution.
&gt;
&gt;
&gt;
&gt;&nbsp;Storage&nbsp;layer:&nbsp;Tencent&nbsp;Cloud&nbsp;Object&nbsp;Storage&nbsp;(COS)&nbsp;will&nbsp;serve&nbsp;as&nbsp;the
&gt;&nbsp;backend&nbsp;storage.
&gt;
&gt;
&gt;
&gt;&nbsp;Data&nbsp;ingestion:&nbsp;Kafka&nbsp;will&nbsp;be&nbsp;used&nbsp;to&nbsp;stream&nbsp;data&nbsp;into&nbsp;the&nbsp;system.
&gt;
&gt;
&gt;
&gt;&nbsp;Monitoring&nbsp;&amp;amp;&nbsp;visualization:&nbsp;Grafana&nbsp;will&nbsp;be&nbsp;used&nbsp;to&nbsp;display
&gt;&nbsp;operational&nbsp;and&nbsp;performance&nbsp;metrics.
&gt;
&gt;
&gt;&nbsp;Could&nbsp;you&nbsp;please&nbsp;provide&nbsp;some&nbsp;recommendations&nbsp;and&nbsp;key&nbsp;considerations​&nbsp;for
&gt;&nbsp;such&nbsp;an&nbsp;architecture?&nbsp;Specifically,&nbsp;I&nbsp;would&nbsp;appreciate&nbsp;guidance&nbsp;on:
&gt;
&gt;
&gt;&nbsp;Best&nbsp;practices&nbsp;for&nbsp;integrating&nbsp;Impala&nbsp;with&nbsp;cloud&nbsp;object&nbsp;storage&nbsp;(COS).
&gt;
&gt;
&gt;
&gt;&nbsp;Performance&nbsp;tuning&nbsp;tips&nbsp;for&nbsp;Impala&nbsp;in&nbsp;a&nbsp;disaggregated&nbsp;environment.
&gt;
&gt;
&gt;
&gt;&nbsp;Any&nbsp;known&nbsp;limitations&nbsp;or&nbsp;compatibility&nbsp;issues&nbsp;when&nbsp;using&nbsp;COS&nbsp;as&nbsp;storage.
&gt;
&gt;
&gt;
&gt;&nbsp;Recommended&nbsp;configurations&nbsp;for&nbsp;Kafka-to-Impala&nbsp;data&nbsp;pipelines.
&gt;
&gt;
&gt;
&gt;&nbsp;Monitoring&nbsp;strategies&nbsp;for&nbsp;tracking&nbsp;query&nbsp;performance&nbsp;and&nbsp;resource&nbsp;usage&nbsp;in
&gt;&nbsp;Grafana.
&gt;
&gt;
&gt;&nbsp;Thank&nbsp;you&nbsp;very&nbsp;much&nbsp;for&nbsp;your&nbsp;support&nbsp;and&nbsp;advice.&nbsp;I&nbsp;look&nbsp;forward&nbsp;to&nbsp;your
&gt;&nbsp;reply.
&gt;
&gt;&nbsp;Best&nbsp;regards

Reply via email to