[ 
https://issues.apache.org/jira/browse/SENTRY-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606181#comment-15606181
 ] 

Vadim Spector commented on SENTRY-1508:
---------------------------------------

I don't believe I've seen expected semantics documented anywhere, so I can only 
say what the code attempts to implement.

The existing code's behavior does not distinguish between the two states. The 
constructor tries to initialize the local cache of authorized paths 
(authzPaths) and if it fails, authzPaths always stays null with no attempt to 
re-initialize it. Lack of re-initialization attempt actually makes sense, 
because when some invalid HMS entries break initialization, it must be fixed 
first, so retrying would be futile. MetastorePlugin still gets constructed and 
Hive code has no problems using it.

Once authzPaths fails to initialize (or is not yet initialized), it stays null. 
The behavior of MetastorePlugin is as follows:

1)  the synchronization thread (SyncTask) periodically makes sure that the 
local cache is in-sync with the Sentry-side. It will short-circuit when 
authzPaths == null. This fact is invisible to the client code, unless one 
checks the logs. The result - increased odds of Sentry state getting 
out-of-sync.

2) Each public callback API (addPath, removeAllPaths, removePath, 
renameAuthzObject) does two things, in that order:
a) updates local cache, i.e. authzPaths, which, of course, fails if authzPaths 
== null due to either initialization-in-progress, or failed initialization.
b) attempts to notify Sentry, which fails whenever (a) throws an exception, 
which happens for the reasons explained in (a). Sentry notification, actually, 
can fail as well, due to the comm error, and fail silently (only checking the 
logs would reveal the failure) - which is another issue.

No direct way exists for the client to check whether MetastorePlugin has been 
initialized successfully, apart from consistent failure of callback APIs.

The scope of the initial patch is to simplify detection and troubleshooting of 
the failed cache initialization + plus proactively fix some found thread 
synchronization issues affecting the normal flow. I believe, there should be no 
way of using MetastorePlugin after failed initialization, and it's 
straightforward to fix - but that that would certainly affect public APIs 
behavior and may affect Hive code. If we remove support for async 
initialization, then we address initialization failure by simply failing to 
construct MetastorePlugin (which would also affect Hive code). It is up to us 
to decide whether it is worth spending time on, or we should wait for HA where 
we'll hopefully do things right.

> MetastorePlugin.java does not handle properly initialization failure
> --------------------------------------------------------------------
>
>                 Key: SENTRY-1508
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1508
>             Project: Sentry
>          Issue Type: Bug
>            Reporter: Vadim Spector
>            Assignee: Vadim Spector
>            Priority: Critical
>         Attachments: SENTRY-1508.001.patch
>
>
> MetastorePlugin.java has several implementation issues that need to be fixed 
> by this JIRA:
> a) The state of MetastorePlugin is not communicated cleanly. The constructor 
> initializes successfully regardless of whether HMS cache initialization was 
> successful or not. The initThread.run() can easily fail when, say, 
> cacheInitializer.createInitialUpdate() fails due to misconfigured HMS, so it 
> is quite possible. The state is communicated through 3 variables: boolean 
> initComplete, Throwable initError, and UpdateableAuthzPaths authzPaths. 
> initComplete is always set to true from finally {} block, so it really 
> indicates the end of an _attempt_ to initialize the cache. It, thus, should 
> only be used to detect whether the cache initialization is still in-progress, 
> not whether initialization has been successful. To determine whether cache 
> initialization was successful, initComplete should be used together with 
> initError, which is set only when initialization fails. This is not how the 
> code implements it in more than one place, which should be clear from 
> analyzing a patch.
> b) There are several synchronization issues that may lead to the failure of 
> synchronizing with the Sentry. They usually have to do with inconsistent use 
> of monitor objects to guard access to thread-unsafe objects. E.g. 
> authzPaths.updatePartial() uses a lock created just for the scope of a single 
> call, which makes it useless for synchronization. A single read-write lock 
> should be used instead, as there is one read and one write operation 
> performed on authzPaths within MetastorePlugin. 
> c) Failure to create authzPaths is not communicated clearly to a caller. When 
> it is dereferenced from the public callback APIs, it results in 
> NullPointerException. The suggested fix of this issue in SENTRY-1270 avoids 
> NullPointerException by detecting the null authzPaths, printing error 
> message, and returning. This effectively leads to consistent failure to 
> update local cache without notifying the caller. The suggested solution would 
> be not only to throw IllegalArgumentException to the caller instead, but to 
> also specify why exactly authzPaths == null - due to initialization still 
> being in progress or due to its failure.
> d) The housekeeping thread running SyncTask to synchronize with Sentry state 
> fails when authzPaths == null. However, it keeps printing misleading "#### 
> Metastore Plugin cache has not finished initialization." message even in the 
> case of a permanent cache initialization failure. It needs to print the real 
> cause of this condition, by analyzing authzPaths together with initComplete 
> and initError values.
> e) getClient() may return null, in which case dereferencing it causes not so 
> helpful NullPointerException. Instead, while getClient() may still print 
> error message, it should also re-throw an original exception, which would 
> then be much easier to debug.
> f) Each code fork deserves log message: e.g. addPath() retrurns immediately 
> if PathsUpdate.parsePath() returns null - w/o printing any log.
> g) in SyncTask.run() - if notificationLock.tryLock() succeeds, yet the next 
> line's expression "MetastorePlugin.this.authzPaths == null" is evaluated to 
> false, run() exits w/o a chance to call notificationLock.unlock(). All the 
> code following lock.tryLock() should be inside try-catch-finally, with 
> finally's first line being lock.unlock(), as a general safe pattern.
> h) some additional misc synchronization issues may also be addressed by the 
> patch.
> What's not in the scope:
> a) the initial patch honors the original design supporting asynchronous 
> synchronization. It means that MetastorePlugin is always constructed, even if 
> it's in a broken state. It is up to the public callback APIs to properly 
> inform the caller of the broken state.
> b) however, if the reviewers decide that support of asynchronous 
> initialization is not necessary, it may be preferable to drop such support in 
> the same JIRA. In this case, second patch can be generated to only support 
> synchronous initialization. This would lead to a much simpler and robust 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to