Thanks for reaching out. Please file a ticket about the error with stack traces and all necessary information (snippets from cagalogd logs) to further debug this.
> I tried to apply a workaround by setting metadata_loader_parallelism=1 via ALTER TABLE We don't have such a table property. Was it suggested by Gemini? > Question: Is there a way to globally set metadata_loader_parallelism=1 for the entire Catalogd, or perhaps a flag to force Impala to ignore block location errors for object storage during the initial load? I suspect a different issue than a concurrency bug, but FWIW Catalogd has backend flags "max_hdfs_partitions_parallel_load" and "max_nonhdfs_partitions_parallel_load" which you can set: https://github.com/apache/impala/blob/master/be/src/catalog/catalog.cc#L39-L45 This affects file metadata loading of non-Iceberg tables as well, but it can be useful to validate the bug. We improved Iceberg table loading since 4.5, and fixed a few issues, so there's a chance that the issue you ran into is already fixed. Would it be possible for you to try out the current Impala master? I think it is time for us to do another upstream release, but it would be nice if we can make sure your use case will work well with the newest release. Cheers, Zoltan On Mon, Jan 5, 2026 at 1:35 PM 汲广熙 <[email protected]> wrote: > Subject: Re: [Bug?] Impala 4.5 Iceberg table loading failure on > Tencent Cloud COS (cosn://) > > Hi, > > Thank you for the suggestions. I have some critical updates regarding the > TableLoadingException: failed to load 1 paths issue on Tencent Cloud > COS (cosn://). > > I have already enabled Local Catalog > Mode (--use_local_catalog=true and minimal topic mode), but > the issue persists. > > Based on an analysis provided by Gemini (Google's AI model) regarding > the stack trace, it was suggested that the error in > ParallelFileMetadataLoader.loadInternal might be caused by concurrency > issues when handling object storage paths. Following this logic, I tried to > apply a workaround by setting metadata_loader_parallelism=1 via ALTER > TABLE. > > However, the command failed with the same AnalysisException and > TableLoadingException. It seems Impala falls into a deadlock state: I > cannot modify the table properties to fix the loading logic because Impala > fails to load the table metadata even during the ALTER TABLE analysis > phase. > > Observations: > > > The error originates from ParallelFileMetadataLoader.loadInternal. > > > > Even with Local Catalog enabled, the loader seems unable to handle the > missing block location info from the cosn:// driver. > > > > Standard Parquet tables work fine; this appears specific to the > Iceberg metadata loading path in Impala 4.5. > > > Question: Is there a way to globally set > metadata_loader_parallelism=1 for the entire Catalogd, or perhaps a > flag to force Impala to ignore block location errors for object storage > during the initial load? > > I am happy to provide a full stack trace or open a JIRA ticket, as this > currently makes Iceberg on COS unusable in our environment. > > > > > > ============================================================================================================ > When I run a simple SELECT *or INSERT INTOquery on this table, I get the > following error: > AnalysisException: Failed to load metadata for table: > 'iceberg_cos_employee_test' > ... > IcebergTableLoadingException: Error loading metadata for Iceberg table > cosn://cdnlogtest-1252412955/impala_test_db/iceberg_cos_employee_test > ... > Loading file and block metadata for 1 paths for table ...: failed to load > 1 paths. > > ============================================================================================================ > > > Best regards, > 原始邮件 > > > 发件人:Zoltán Borók-Nagy <[email protected]> > 发件时间:2025年12月23日 23:52 > 收件人:dev <[email protected]> > 主题:Re: Advice and Considerations for Building an Impala > Compute-StorageSeparated Architecture > > > > Hi, > > > Just a few tips off the top of my head: > > - use dedicated coordinators and executors, rule of thumb for > > coordinator:executor ratio is 1:50. Though for HA you probably want >1 > coordinators. > > - use local catalog mode (aka On-demand metadata): > https://impala.apache.org/docs/build/html/topics/impala_metadata.html > > - enabling remote data cache (with SSD disks) is essential in > compute-storage separated setup: > https://impala.apache.org/docs/build/html/topics/impala_data_cache.html > > > What table format / file format are you planning to use? > > If table format is Iceberg, make sure you use the latest Impala as we > > continuously improve Impala's performance on Iceberg. > > File format: Impala most efficiently works on Parquet files. > > Avoid small file issues: > > - choose proper partitioning for your data, i.e. avoid too coarse-grained > > and too fine-grained partitioning. I.e. you probably want more than 200 MB > > data per partition, but probably less than 20 GB. > > - compact your tables regularly, for Iceberg tables Impala has the > OPTIMIZE statement: > https://impala.apache.org/docs/build/html/topics/impala_iceberg.html > > I hope others chime in as well. > > > We would love to hear back about your experiences, and feel free to open > > tickets for Impala if you run into any issue: > https://issues.apache.org/jira/projects/IMPALA/issues > > Cheers, > Zoltan > > > On Tue, Dec 23, 2025 at 8:23 AM 汲广熙 < > [email protected]> wrote: > > > Dear Impala Team, > > > > I hope this message finds you well. > > > > > I am currently planning to build a compute-storage separated architecture > > > based on Apache Impala. In this setup: > > > > > > > Compute layer: Apache Impala will be used for SQL query execution. > > > > > > > > > Storage layer: Tencent Cloud Object Storage (COS) will serve as the > > backend storage. > > > > > > > > > Data ingestion: Kafka will be used to stream data into the system. > > > > > > > > > Monitoring &amp; visualization: Grafana will be used to display > > operational and performance metrics. > > > > > > > Could you please provide some recommendations and key considerations for > > > such an architecture? Specifically, I would appreciate guidance on: > > > > > > > Best practices for integrating Impala with cloud object storage (COS). > > > > > > > > > Performance tuning tips for Impala in a disaggregated environment. > > > > > > > > > Any known limitations or compatibility issues when using COS as storage. > > > > > > > > > Recommended configurations for Kafka-to-Impala data pipelines. > > > > > > > > > Monitoring strategies for tracking query performance and resource usage in > > Grafana. > > > > > > > Thank you very much for your support and advice. I look forward to your > > reply. > > > > Best regards
