[ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector reassigned SENTRY-1891:
-------------------------------------

    Assignee: Vadim Spector

> SentryPlugin triggers full update due to concurrency bug
> --------------------------------------------------------
>
>                 Key: SENTRY-1891
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1891
>             Project: Sentry
>          Issue Type: Bug
>            Reporter: Vadim Spector
>            Assignee: Vadim Spector
>
> Sentry server can trigger full update to NameNode for no good reason, in high 
> load cases, due to the bug in concurrency handling. Full update is always 
> done at the initialization time and for large number of permissions and/or 
> hdfs paths it takes lots of time and significantly increases (temporarily) 
> heap size. It is performed during normal operations only if Sentry server 
> decides that it has gaps in the partial updates' sequence numbers, so it 
> restores the entire snapshot. Full update during normal operations is a 
> highly disruptive event, leading to huge increasw in heap size, system 
> slowdown, and even crash. Below is the explanation of the problem:
> ====================================================================
> *SentryPlugin.java* has multiple permission update methods, such as 
> _onDropSentryRole()_, _onDropSentryPrivilege()_, 
> _onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
> concurrently. And they all delegate work to 
> _permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
> serializes those requests by turning them into the Runnable tasks and 
> submitting them into a single-threaded thread pool. Each job ends up 
> appending an update to the updates list. So far so good, all updates are 
> serialized.
> However, each of those methods first creates _PermissionsUpdate_ object with 
> auto-incremented sequence number, and only then passes this object to 
> _permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:
> {quote}PermissionsUpdate update = new 
> PermissionsUpdate(permSeqNum.incrementAndGet(), false);
> // what if another permission update thread preempts this one right here and 
> starts and finishes the whole method ???
> update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
> // ... or here ???
> permsUpdater.handleUpdateNotification(update); 
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}
> The problem is that sequence number assignment to _PermissionsUpdate_ object 
> and appending this object to the updates list (by calling 
> _permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
> operations, and it is not. In multi-threaded environment sooner or later we 
> can end up with appending updates into the updates list with  out of order 
> sequence numbers.
> The problem is not that the updates may end up appended to the updates list 
> not in the order of their arrival. We never guaranteed the right order for 
> concurrent requests coming from different clients. The problem is that they 
> end up appended to the list with their sequence numbers out of order, e.g. 
> #101, #100, instead of #100, #101. Then handleUpdateNotification() method has 
> the following lines:
> {quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
> update.getSeqNum();{quote}
> Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
> obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
> update logic a few lines of code later:
> {quote}if (imageRetreiver != null) \{
>                try \{
>                  toUpdate = 
> imageRetreiver.retrieveFullImage(update.getSeqNum());
>                \} catch (Exception e) \{
>                  LOGGER.warn("failed to retrieve full image: ", e);
>                \}
>                updateable = updateable.updateFull(toUpdate);
> \}{quote}
> Since imageRetriever != null for permissions updater, all the hell breaks 
> loose.
> It can be fixed in _SentryPlugin.java_ for all permission update methods as 
> follows:
> {quote}PermissionsUpdate update;
> synchronized (permsUpdateLock) \{ // now sequence number increment and adding 
> update to the list are performed atomically
>   update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
>    update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
>    permsUpdater.handleUpdateNotification(update);
> \}
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
> {quote}
>  _handleUpdateNotification()_ is always fast since it submits real work to 
> thread pool; so, it does not block, and introducing additional 
> synchronization should not affect concurrency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to