Subject: Re: [Bug?] Impala 4.5 Iceberg table loading failure on Tencent
Cloud COS (cosn://)
Hi,
Thank you for the suggestions. I have some critical updates regarding the
TableLoadingException: failed to load 1 paths issue on Tencent Cloud COS
(cosn://).
I have already enabled Local Catalog
Mode (--use_local_catalog=true and minimal topic mode), but the
issue persists.
Based on an analysis provided by Gemini (Google's AI model) regarding the
stack trace, it was suggested that the error in
ParallelFileMetadataLoader.loadInternal might be caused by concurrency
issues when handling object storage paths. Following this logic, I tried to
apply a workaround by setting metadata_loader_parallelism=1 via ALTER
TABLE.
However, the command failed with the same AnalysisException and
TableLoadingException. It seems Impala falls into a deadlock state: I cannot
modify the table properties to fix the loading logic because Impala fails to
load the table metadata even during the ALTER TABLE analysis phase.
Observations:
The error originates from ParallelFileMetadataLoader.loadInternal.
Even with Local Catalog enabled, the loader seems unable to handle the missing
block location info from the cosn:// driver.
Standard Parquet tables work fine; this appears specific to the
Iceberg metadata loading path in Impala 4.5.
Question: Is there a way to globally set metadata_loader_parallelism=1 for
the entire Catalogd, or perhaps a flag to force Impala to ignore block location
errors for object storage during the initial load?
I am happy to provide a full stack trace or open a JIRA ticket, as this
currently makes Iceberg on COS unusable in our environment.
============================================================================================================
When I run a simple SELECT *or INSERT INTOquery on this table, I get the
following error:
AnalysisException: Failed to load metadata for table:
'iceberg_cos_employee_test'
...
IcebergTableLoadingException: Error loading metadata for Iceberg table
cosn://cdnlogtest-1252412955/impala_test_db/iceberg_cos_employee_test
...
Loading file and block metadata for 1 paths for table ...: failed to load 1
paths.
============================================================================================================
Best regards,
原始邮件
发件人:Zoltán Borók-Nagy <[email protected]>
发件时间:2025年12月23日 23:52
收件人:dev <[email protected]>
主题:Re: Advice and Considerations for Building an Impala
Compute-StorageSeparated Architecture
Hi,
Just a few tips off the top of my head:
- use dedicated coordinators and executors, rule of thumb for
coordinator:executor ratio is 1:50. Though for HA you probably want >1
coordinators.
- use local catalog mode (aka On-demand metadata):
https://impala.apache.org/docs/build/html/topics/impala_metadata.html
- enabling remote data cache (with SSD disks) is essential in
compute-storage separated setup:
https://impala.apache.org/docs/build/html/topics/impala_data_cache.html
What table format / file format are you planning to use?
If table format is Iceberg, make sure you use the latest Impala as we
continuously improve Impala's performance on Iceberg.
File format: Impala most efficiently works on Parquet files.
Avoid small file issues:
- choose proper partitioning for your data, i.e. avoid too coarse-grained
and too fine-grained partitioning. I.e. you probably want more than 200 MB
data per partition, but probably less than 20 GB.
- compact your tables regularly, for Iceberg tables Impala has the
OPTIMIZE statement:
https://impala.apache.org/docs/build/html/topics/impala_iceberg.html
I hope others chime in as well.
We would love to hear back about your experiences, and feel free to open
tickets for Impala if you run into any issue:
https://issues.apache.org/jira/projects/IMPALA/issues
Cheers,
Zoltan
On Tue, Dec 23, 2025 at 8:23
AM 汲广熙 <[email protected]> wrote:
> Dear Impala Team,
>
> I hope this message finds you well.
>
> I am currently planning to build a compute-storage separated architecture
> based on Apache Impala. In this setup:
>
>
> Compute layer: Apache Impala will be used for SQL query execution.
>
>
>
> Storage layer: Tencent Cloud Object Storage (COS) will serve as the
> backend storage.
>
>
>
> Data ingestion: Kafka will be used to stream data into the system.
>
>
>
> Monitoring &amp; visualization: Grafana will be used to display
> operational and performance metrics.
>
>
> Could you please provide some recommendations and key considerations for
> such an architecture? Specifically, I would appreciate guidance on:
>
>
> Best practices for integrating Impala with cloud object storage (COS).
>
>
>
> Performance tuning tips for Impala in a disaggregated environment.
>
>
>
> Any known limitations or compatibility issues when using COS as storage.
>
>
>
> Recommended configurations for Kafka-to-Impala data pipelines.
>
>
>
> Monitoring strategies for tracking query performance and resource usage in
> Grafana.
>
>
> Thank you very much for your support and advice. I look forward to your
> reply.
>
> Best regards