adrians commented on issue #12554: URL: https://github.com/apache/iceberg/issues/12554#issuecomment-3113576724
I found a similar problem with a table migrated using `rewrite_table_path` when querying using Impala. Basically Impala reads the manifest-list, gets the list of manifest-files and their sizes, and since the manifest-files were rewritten for the new path (don't have the same exact size as in the original table, as the file-contant has changed) and the manifest-list does not contain the updated sizes, Impala throws an error (the actual manifest-file sizes doesn't match the expected manifest-file size as declared in the manifest-list). Hive and Spark query-engines don't check the manifest-file length, but Impala does. Impala error-message: ``` AnalysisException: Failed to load metadata for table: '...' CAUSED BY: TableLoadingException: Could not load table ... from catalog CAUSED BY: TException: TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, error_msgs:[IcebergTableLoadingException: Error loading metadata for Iceberg table ... CAUSED BY: UncheckedIOException: Failed to open input stream for file ...-m2.avro: java.io.IOException: Expected to read 730455 bytes, but only 729310 bytes read. CAUSED BY: IOException: Expected to read 730455 bytes, but only 729310 bytes read.]), lookup_status:OK) ``` It seems that the logical flow should be ```mermaid flowchart LR A["<b>Rewrite delete files</b><br>- Update paths for data-files"]-->B["<b>Rewrite the manifest files</b><br>- Update paths for data-files<br>- Update paths and sizes for delete-files"]-->C["<b>Rewrite the manifest-lists</b><br>- Update paths and sizes for manifest-files"] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
