[ https://issues.apache.org/jira/browse/IMPALA-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953355#comment-17953355 ]
ASF subversion and git services commented on IMPALA-10254: ---------------------------------------------------------- Commit ef174d3aa5405043fa5084cac83bafcdc1afd473 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ef174d3aa ] IMPALA-12162: Checksum files before lock in INSERT Collect file metadata - file checksums and ACID directory path - before acquiring the table lock. Table lock doesn't prevent files from being deleted from the underlying filesystem, and these operations can take time, blocking other operations that depend on the table lock. Fires InsertEvents with partial data if there are errors collecting checksum or acidDirPath on individual files to provide best-effort information. Hive defaults to empty string for these values when not specified. IMPALA-10254 has been resolved, so removes the exception for FeIcebergTable and associated TODO. Change-Id: I18f9686f5d53cf1e7c384684c25427fb5353e2af Reviewed-on: http://gerrit.cloudera.org:8080/22871 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Load data files via Iceberg for Iceberg Tables > ---------------------------------------------- > > Key: IMPALA-10254 > URL: https://issues.apache.org/jira/browse/IMPALA-10254 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Zoltán Borók-Nagy > Assignee: Tamas Mate > Priority: Major > Labels: impala-iceberg > > Currently we still load the file descriptors of an Iceberg table via > recursive file listing. > This lists too many files, e.g. metadata files, files that are being written > (can later throw checksum errors), files from aborted INSERTs, removed files, > etc. > We should use the Iceberg API to load the file descriptors corresponding to > the table snapshot. Iceberg DataFiles might also already contain the split > offsets. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org