Thejas M Nair commented on HIVE-18940:

For replication purposes, and perhaps for sentry delta updates capture as well, 
the EVENT_ID has to be in the order of commit.
For example, if the EVENT_ID 5 has been written and then consumed by 
replication program, it would then only look for rows where EVENT_ID > 5. So if 
there are two concurrent transactions writing new rows and one of them with 
EVENT_ID 5 commits before EVENT_ID 4, then EVENT_ID 4 would get missed.
Holes would be OK, what is not OK is that for another application to see row 
with EVENT_ID 5 getting visible before one with EVENT_ID 4.

I believe the use of database autoincrement field was considered in HIVE-16886 
and it wasn't meeting this criteria. 

cc [~anishek]

> Hive notifications serialize all write DDL operations
> -----------------------------------------------------
>                 Key: HIVE-18940
>                 URL: https://issues.apache.org/jira/browse/HIVE-18940
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Alexander Kolbasov
>            Priority: Major
> The implementation of DbNotificationListener uses a single row to store 
> current notification ID and uses {{SELECT FOR UPDATE}} to lock the row. This 
> serializes all write DDL operations which isn't good.
> We should consider using database auto-increment for notification ID instead. 
> Especially on mMySQL/innoDb it is supported natively with relatively 
> light-weight locking. 
> This creates potential issue for consumers though because such IDs may have 
> holes. There are two types of holes - transient hole for a transaction which 
> have not committed yet and will be committed shortly and permanent holes for 
> transactions that fail. Consumers need to deal with it. It may be useful to 
> add DB-generated timestamp as well to assist in recovery from holes.

This message was sent by Atlassian JIRA

Reply via email to