[jira] [Commented] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables

ASF subversion and git services (Jira) Thu, 22 May 2025 01:03:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953355#comment-17953355
 ]


ASF subversion and git services commented on IMPALA-10254:
----------------------------------------------------------

Commit ef174d3aa5405043fa5084cac83bafcdc1afd473 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ef174d3aa ]

IMPALA-12162: Checksum files before lock in INSERT

Collect file metadata - file checksums and ACID directory path - before
acquiring the table lock. Table lock doesn't prevent files from being
deleted from the underlying filesystem, and these operations can take
time, blocking other operations that depend on the table lock.

Fires InsertEvents with partial data if there are errors collecting
checksum or acidDirPath on individual files to provide best-effort
information. Hive defaults to empty string for these values when not
specified.

IMPALA-10254 has been resolved, so removes the exception for
FeIcebergTable and associated TODO.

Change-Id: I18f9686f5d53cf1e7c384684c25427fb5353e2af
Reviewed-on: http://gerrit.cloudera.org:8080/22871
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Load data files via Iceberg for Iceberg Tables
> ----------------------------------------------
>
>                 Key: IMPALA-10254
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10254
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we still load the file descriptors of an Iceberg table via 
> recursive file listing.
> This lists too many files, e.g. metadata files, files that are being written 
> (can later throw checksum errors), files from aborted INSERTs, removed files, 
> etc.
> We should use the Iceberg API to load the file descriptors corresponding to 
> the table snapshot. Iceberg DataFiles might also already contain the split 
> offsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables

Reply via email to