Re: Review Request 64955: SENTRY-2109: Fix the logic of identifying HMS out of Sync

Vadim Spector via Review Board Tue, 23 Jan 2018 13:06:04 -0800


> On Jan. 22, 2018, 8:49 p.m., Vadim Spector wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
> > Lines 67 (patched)
> > <https://reviews.apache.org/r/64955/diff/4/?file=1943745#file1943745line68>
> >
> >     I wonder about the choice of LRUMap for cache.
> >     
> >     The ideal data structure would one only store the caches for the events 
> > in the ID range that we want, which is [MAX(event_id) - refetch_count, 
> > MAX(event_id)].
> >     
> >     LRUMap does not explicitly address this requirement. It assumes that 
> > the oldest (least accessed) caches will tend to be for the events with the 
> > oldest IDs. We know this is often not true, which is one of the problems 
> > this JIRA tries to work around.
> >     
> >     Given that when we do not find the entry in cache we check SentryStore 
> > anyway, everything will still work. But we may end up with a few extra DB 
> > calls to SentryStore and it's hard to predict the overhead.
> >     
> >     The cache structure solving this shortcoming could be 
> > HashSet<String>[notificationFetchCount +1] array.
> >     
> >     It's the circular buffer structure where the cache[ID % cache.length] 
> > element contains the HashSet of all caches for the events with the given 
> > event ID. We can see just as fast if the event has been processed.
> >     
> >     Before each fetch we can calculate which elements in this cache array 
> > correspond to the IDs outside the current fetching range and nullify them 
> > (for curcular buffer some care is needed for modulo arithmetic).
> >     
> >     This way we know that
> >     a) cache stores the caches of the events only in the range we care 
> > about, though now cleanup is explicit.
> >     b) we do not necessarily need to put the upper limit on the number of 
> > events with the given ID (the sizes of HashSet's).
> >     
> >     It may save some expensive DB calls.
> >     
> >     However, I do not feel strongly one way or the other until we test it 
> > and see the overhead. Perhaps the fudge factor x3 in LRUMap size is enough 
> > to keep enough entries, old or new; so I'm leaving it up to you.
> 
> Na Li wrote:
>     Vadim, I prefer to use circular buffer as well. Mentioned this to Kalyan 
> before. The memory usage and execution overhead should be much smaller than 
> LRUMap.
> 
> Vadim Spector wrote:
>     I like that if the entry is not found in the cache, there is no need to 
> go to the database to double-check. Can't beat that.
> 
> kalyan kumar kalvagadda wrote:
>     Help we understand your comment.
>     "LRUMap does not explicitly address this requirement. It assumes that the 
> oldest (least accessed) caches will tend to be for the events with the oldest 
> IDs. We know this is often not true, which is one of the problems this JIRA 
> tries to work around."
>     
>     Can you give me one example where you think it would not be found in the 
> cache?


For simplicity of the example, suppose we look back by 2 IDs. It means that we 
fetch with starting number (MaxID - 2) to include 3 IDs: (MaxID-2, MaxID-1, 
MaxID). Therefore, the LRUMap size is 3 * 3 = 9.
Suppose, starting point is LRUMap is empty (just to keep the explanation 
simple), then we perform several fetches, in the exact order specified (first 
listed ID arrives first):

fetch #1 (start_id = 0): 3, 3, 3, 2, 2, 2, 1, 1, 1
         Out of order, can happen, the entries for ID = 3 are the oldest ones. 
LRUMap reaches max size 9. maxID = 3.

fetch #2 (start_id = 3 - 2 = 1):  3, 3, 3, 2, 2, 2, 1, 1, 1 (re-fetched from 
fetch #1), 3, 4 (new events)
         The LRUMap automatically purges the first 2 entries with the ID = 3 to 
accomodate new events 3 and 4. maxID = 4.

fetch #3 (start_id = 4 - 2 = 2): 3, 3, 3, 2, 2, 2, (refetched from fetch #1, 
excluding ID = 1, as it is < 2), 3, 4 (refetched from fetch #2), 5 (new)

Let's stop here. In fetch #3 we still get the first two events with ID = 3, 
which were originally in fetch #1 and fetch #2. However, they are no longer in 
LRUMap, purged after fetch #2, so we end up with DB lookup.

I just provided the simplest example. It is probably extreme that the event 
with the biggest ID in the fetch arrives first in the fetch, as in fetch #1. 
However, it can arrive somewhere in the middle of the fetch too, and some time 
later LRU will still purge it ahead of the events with the smaller ID that 
should've been purged first.

As I was working through this example, I realized that things could be better 
if after each fetch the events were first sorted by their IDs before adding to 
LRUMap. I do not think it will entirely solve the problem of prematurely purged 
entries. The example where all IDs get sorted prior to putting them into LRUMap:

fetch #1 (start_id = 0): 1, 1, 1, 2, 2, 2, 3, 3, 3
         All in order. MaxID = 3

fetch #2 (start_id = 3 - 2 = 1): 1, 1, 1, 2, 2, 2, 3, 3, 3 (from fetch #1), 1, 
1, 1, 1, 2, 2, 2, (new), 4 (new). MaxID = 4
         new fetched entries replace all entries with ID 2 and some entries 
with ID = 3 in LRUMap
         
fetch #3 (start_id = 4 - 2 = 2): 2, 2, 2, 2, 3, 3, 3, 4, (old) 5 (new)
         IDs 2 and 3 are gone from LRUMap, which is filled with useless 
out-of-fetch-range entries with ID = 1. End up with DB lookup for IDs 2 and 
(some) 3s.

Does it make sense? The problem is that "oldest LRUMap entries" and "LRUMap 
entries with the smallest IDs" are not the same thing.

I could've probably come up with easier examples, that's just first that came 
to my mind.

It seems that sorting did make the second example less realistic than the first 
one.
It means that at least sorting the fetched events by their IDs prior to 
updating the LRUMap could be a pretty easy yet meaningful improvement to the 
existing implementation.
I would probably be comfortable with just that improvement - what do you think?


- Vadim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64955/#review195917
-----------------------------------------------------------


On Jan. 22, 2018, 5:40 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64955/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2018, 5:40 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, Sergio 
> Pena, and Vadim Spector.
> 
> 
> Bugs: SENTRY-2109
>     https://issues.apache.org/jira/browse/SENTRY-2109
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> This patch does couple of things
> 1. Avoid triggering full snapshots when gaps are observed while fetching new 
> notifications. While fetching new notifications HMSFollower would would fetch 
> notifications with last event-id as well. When it gets the notifications and 
> if it doesn't get the notifications with event-id, full snpshot is triggered.
> 2. Functinalty to address gaps and out-of-sequence notificaitons by 
> re-fetching addtional notifications that were already fetched. This solution 
> is not fool proof. It does a best effort to reduse the chance of loosing 
> notifications by re-fetching the notifications.This approach will introduce 
> an over head of fetching addtional notifications that were already fetched. 
> Overhead of DB look-up is addressed by using a cache. This reduces additional 
> DB lookups needed to check if the notification was already processed.
> 2. Added looging to report duplicate events and out of order events for debug 
> purpose.
> 3. Added new e2e tests to verfy this behavior.
> 
> 
> Diffs
> -----
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/exception/SentryOutOfSyncException.java
>  PRE-CREATION 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  6c4631fa74760d8721b5740dd3dffb2c1d8e72e6 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
>  aa1b6a31c28f35af86952c213d5e20a8c9bb3490 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
>  097aa62912e92ece7f8da0ac0fccb124579a88f2 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryService.java
>  43535a7b50fea51049f3142837736e6a99a3a80f 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java
>  7e02874b4be6a7109108809b1f404fe971b7b8e2 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollowerSentryStoreIntegration.java
>  501898bca261db2daf937a8d803d12a59616192b 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  b4100278392986c161625a366212c6fef66ec0a9 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java
>  edde886a7646539499149f2d86758979436567bd 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java
>  83a1becd46ac2d69c7d6dd05ed6253d1cdd8800d 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreation.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreationWithShorterHMSEventTtl.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotWithLongerHMSFollowerLongerInterval.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java
>  4cd00e6672730773c74b9840247d1f4d5f7bdfe4 
> 
> 
> Diff: https://reviews.apache.org/r/64955/diff/4/
> 
> 
> Testing
> -------
> 
> Made sure that tests pass.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>

Re: Review Request 64955: SENTRY-2109: Fix the logic of identifying HMS out of Sync

Reply via email to