[
https://issues.apache.org/jira/browse/SENTRY-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Kolbasov updated SENTRY-1895:
---------------------------------------
Description:
As shown in HIVE-16738, notification IDs generated by Hive may be non-unique
and there may be cases with different evnts sharing the same ID. This creates
various problems for Sentry/Hive interaction and we should fine some short
-term solution until it is fixed in Hive.
The issue was addressed in SENTRY-1803 by removing a primary-key constraint on
the notification Id which allows for multiple keys. But this creates other
problems:
1. We are using the primary key constraint to prevent multiple instances of
Sentry from processing the same notifications multiple times.
2. We are using max(notificationId) to find the last processed event. When the
field is a primary key, this operation is an index scan, but when it isn't, it
is a full table scan which is more expensive.
We also have a few other problems caused by duplicate IDs which are not related
and not addressed by SENTRY-1803:
1. There is a synchronization mechanism between HMS and Sentry which ensures
that a given event is processed. This doesn't work in the presence of duplicate
IDs.
2. Some events may be missed due to the way they are processed.
was:As shown in https://issues.apache.org/jira/browse/HIVE-16738,
notification IDs generated by Hive may be non-unique and there may be cases
with different evnts sharing the same ID. This creates various problems for
Sentry/Hive interaction and we should fine some short -term solution until it
is fixed in Hive.
> Sentry should handle the case of multiple notifications with the same ID
> ------------------------------------------------------------------------
>
> Key: SENTRY-1895
> URL: https://issues.apache.org/jira/browse/SENTRY-1895
> Project: Sentry
> Issue Type: Sub-task
> Components: Sentry
> Affects Versions: 2.0.0
> Reporter: Alexander Kolbasov
> Assignee: Sergio Peña
> Fix For: 2.0.0
>
>
> As shown in HIVE-16738, notification IDs generated by Hive may be non-unique
> and there may be cases with different evnts sharing the same ID. This creates
> various problems for Sentry/Hive interaction and we should fine some short
> -term solution until it is fixed in Hive.
> The issue was addressed in SENTRY-1803 by removing a primary-key constraint
> on the notification Id which allows for multiple keys. But this creates other
> problems:
> 1. We are using the primary key constraint to prevent multiple instances of
> Sentry from processing the same notifications multiple times.
> 2. We are using max(notificationId) to find the last processed event. When
> the field is a primary key, this operation is an index scan, but when it
> isn't, it is a full table scan which is more expensive.
> We also have a few other problems caused by duplicate IDs which are not
> related and not addressed by SENTRY-1803:
> 1. There is a synchronization mechanism between HMS and Sentry which ensures
> that a given event is processed. This doesn't work in the presence of
> duplicate IDs.
> 2. Some events may be missed due to the way they are processed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)