[ 
https://issues.apache.org/jira/browse/IMPALA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333267#comment-17333267
 ] 

ASF subversion and git services commented on IMPALA-10656:
----------------------------------------------------------

Commit c65d7861d9ae28f6fc592727ff699a8155dcda2c in impala's branch 
refs/heads/dependabot/pip/infra/python/deps/py-1.10.0 from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c65d786 ]

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Reviewed-on: http://gerrit.cloudera.org:8080/17313
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Fire insert events before commit
> --------------------------------
>
>                 Key: IMPALA-10656
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10656
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>
> Currently Impala commits an insert first, then reloads the table from HMS, 
> and generates the insert events based on the difference between the two 
> snapshots. (e.g. which file was not present in the old snapshot but are there 
> in the new). Hive replication expects the insert events before the commit, so 
> this may potentially lead to issues there,
> The solution is to collect the new files during the insert in the backend, 
> and send the insert events based on this file set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to