[
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vadim Spector reassigned SENTRY-1891:
-------------------------------------
Assignee: Vadim Spector
> SentryPlugin triggers full update due to concurrency bug
> --------------------------------------------------------
>
> Key: SENTRY-1891
> URL: https://issues.apache.org/jira/browse/SENTRY-1891
> Project: Sentry
> Issue Type: Bug
> Reporter: Vadim Spector
> Assignee: Vadim Spector
>
> Sentry server can trigger full update to NameNode for no good reason, in high
> load cases, due to the bug in concurrency handling. Full update is always
> done at the initialization time and for large number of permissions and/or
> hdfs paths it takes lots of time and significantly increases (temporarily)
> heap size. It is performed during normal operations only if Sentry server
> decides that it has gaps in the partial updates' sequence numbers, so it
> restores the entire snapshot. Full update during normal operations is a
> highly disruptive event, leading to huge increasw in heap size, system
> slowdown, and even crash. Below is the explanation of the problem:
> ====================================================================
> *SentryPlugin.java* has multiple permission update methods, such as
> _onDropSentryRole()_, _onDropSentryPrivilege()_,
> _onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called
> concurrently. And they all delegate work to
> _permsUpdater.handleUpdateNotification(update)_ call, which, in turn,
> serializes those requests by turning them into the Runnable tasks and
> submitting them into a single-threaded thread pool. Each job ends up
> appending an update to the updates list. So far so good, all updates are
> serialized.
> However, each of those methods first creates _PermissionsUpdate_ object with
> auto-incremented sequence number, and only then passes this object to
> _permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:
> {quote}PermissionsUpdate update = new
> PermissionsUpdate(permSeqNum.incrementAndGet(), false);
> // what if another permission update thread preempts this one right here and
> starts and finishes the whole method ???
> update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName,
> privilege.getAction().toUpperCase());
> // ... or here ???
> permsUpdater.handleUpdateNotification(update);
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}
> The problem is that sequence number assignment to _PermissionsUpdate_ object
> and appending this object to the updates list (by calling
> _permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of
> operations, and it is not. In multi-threaded environment sooner or later we
> can end up with appending updates into the updates list with out of order
> sequence numbers.
> The problem is not that the updates may end up appended to the updates list
> not in the order of their arrival. We never guaranteed the right order for
> concurrent requests coming from different clients. The problem is that they
> end up appended to the list with their sequence numbers out of order, e.g.
> #101, #100, instead of #100, #101. Then handleUpdateNotification() method has
> the following lines:
> {quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() ==
> update.getSeqNum();{quote}
> Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then,
> obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full
> update logic a few lines of code later:
> {quote}if (imageRetreiver != null) \{
> try \{
> toUpdate =
> imageRetreiver.retrieveFullImage(update.getSeqNum());
> \} catch (Exception e) \{
> LOGGER.warn("failed to retrieve full image: ", e);
> \}
> updateable = updateable.updateFull(toUpdate);
> \}{quote}
> Since imageRetriever != null for permissions updater, all the hell breaks
> loose.
> It can be fixed in _SentryPlugin.java_ for all permission update methods as
> follows:
> {quote}PermissionsUpdate update;
> synchronized (permsUpdateLock) \{ // now sequence number increment and adding
> update to the list are performed atomically
> update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
> update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName,
> privilege.getAction().toUpperCase());
> permsUpdater.handleUpdateNotification(update);
> \}
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
> {quote}
> _handleUpdateNotification()_ is always fast since it submits real work to
> thread pool; so, it does not block, and introducing additional
> synchronization should not affect concurrency.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)