[ 
https://issues.apache.org/jira/browse/SENTRY-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137045#comment-16137045
 ] 

Sergio Peña commented on SENTRY-1895:
-------------------------------------

Doesn't option 3 have the same issue as SENTRY-1803  when two sentry servers 
are processing the same notification? To deal with a duplicated ID, then Sentry 
would need to read if such ID is duplicated, then increment it, then persist 
it. If 2 servers are writing, then one server will retry with the new increment?

I think option 2 would help if we do a hash of the rest of the notification, 
and store both the ID and HASH in a table. If both are primary keys, then a 
second server attempting to process and write the same notification will fail. 

> Sentry should handle the case of multiple notifications with the same ID
> ------------------------------------------------------------------------
>
>                 Key: SENTRY-1895
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1895
>             Project: Sentry
>          Issue Type: Sub-task
>          Components: Sentry
>    Affects Versions: 2.0.0
>            Reporter: Alexander Kolbasov
>            Assignee: Sergio Peña
>             Fix For: 2.0.0
>
>
> As shown in HIVE-16886, notification IDs generated by Hive may be non-unique 
> and there may be cases with different evnts sharing the same ID. This creates 
> various problems for Sentry/Hive interaction and we should fine some short 
> -term solution until it is fixed in Hive.
> The issue was addressed in SENTRY-1803 by removing a primary-key constraint 
> on the notification Id which allows for multiple keys. But this creates other 
> problems:
> 1. We are using the primary key constraint to prevent multiple instances of 
> Sentry from processing the same notifications multiple times.
> 2. We are using max(notificationId) to find the last processed event. When 
> the field is a primary key, this operation is an index scan, but when it 
> isn't, it is a full table scan which is more expensive.
> We also have a few other problems caused by duplicate IDs which are not 
> related and not addressed by SENTRY-1803:
> 1. There is a  synchronization mechanism between HMS and Sentry which ensures 
> that a given event is processed. This doesn't work in the presence of 
> duplicate IDs.
> 2. Some events may be missed due to the way they are processed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to