[ 
https://issues.apache.org/jira/browse/IMPALA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341364#comment-17341364
 ] 

ASF subversion and git services commented on IMPALA-10656:
----------------------------------------------------------

Commit 7f1a3ff69b49331bf310d34e80dbdb6929833830 in impala's branch 
refs/heads/branch-4.0.0 from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7f1a3ff ]

IMPALA-10692: Fix acid insert when event polling is enabled

IMPALA-10656 broke inserts to acid tables when HMS event polling
is enabled. The issue was that the new partitions created during
insert were not added to the catalog table yet when createInsertEvents
is called, as the table is reloaded only after firing the events and
committing the transaction.

The fix is to create the INSERT event based on the partition name
and the fileset alone for new partitions. Already existing partitions
need the Partition object as we add the event to the list of the
partition's in-flight events to detect self-events, but luckily new
partitions don't need self event-handling because:
- new partitions fire events only if the table is ACID
- ACID inserts don't fire any INSERT event visible to Impala, so
  it cannot cause an unnecessary metadata reload

ACID inserts from Hive work differently, they always cause an
ALTER_TABLE or ALTER_PARTITION event which are detected by Impala
and lead to metadata reload. I think that this situation is hacky
at best because these events come before COMMIT event (which is
currently ignored by Impala), so Impala may reload the table too
early (before the commit is finished).

Testing:
- added acid tables to TestEventProcessing.test_self_events

Change-Id: I8c2d0702232538a746410539ad55f87b7fde57e7
Reviewed-on: http://gerrit.cloudera.org:8080/17380
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Csaba Ringhofer <[email protected]>


> Fire insert events before commit
> --------------------------------
>
>                 Key: IMPALA-10656
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10656
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>
> Currently Impala commits an insert first, then reloads the table from HMS, 
> and generates the insert events based on the difference between the two 
> snapshots. (e.g. which file was not present in the old snapshot but are there 
> in the new). Hive replication expects the insert events before the commit, so 
> this may potentially lead to issues there,
> The solution is to collect the new files during the insert in the backend, 
> and send the insert events based on this file set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to