Re: Review Request 60926: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Sergio Pena Wed, 19 Jul 2017 12:11:05 -0700


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 97 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line97>
> >
> >     Can you achieve the same purpose by providing client in a constructor? 
> > This may be a bit cleaner.


The constructor parameters are increasing. We have currently 6, and now we will 
provide 2 more. I think is cleaner to have setters and avoid large # of method 
parameters, don't you think?


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Line 152 (original), 159 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line159>
> >
> >     What if we have an old snapshot so it isn't empty (and in fact we keep 
> > some number of old snapshots around) so this will only be the case the 
> > first time. How is the decision made afterwords?

I created a isFullSnapshotRequired() method that have 2 conditions to check for 
full snapshots. It has better explanation there.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 165 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line165>
> >
> >     Why do we need to keep in in the class field?
> >     
> >     This is rather weird - we detect that we need a full snapshot, and 
> > instead of just getting it, record this piece of information and retrieve 
> > it during the next cycle. What is the point in doing it this way?

I did some changes on the run() method to make this cleaner. I think we should 
either create a snapshot or process notifications per run to reduce the 
conditions and make the code cleaner. If a snapshot is done, why should we try 
to fetch notifications again? Better to do either one per run I think.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 169 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line169>
> >
> >     Suppose that we have two Sentry servers. The first one got the current 
> > notification ID, but later looses leadership and another one processes a 
> > few requests by the time we get here, so we can run into this situation. So 
> > we should be a bit more conservative and may be check a few things first:
> >     
> >     1) Check that we are still the leader
> >     2) Re-read currentNotificationId and make sure it didn't advanced 
> > forward.
> >     
> >     Once we detect the condition we should just get the snapshot rather 
> > then defer it to the next iteration.

What about checking for leadership at the moment of persisting the snapshot? 
Otherwise, I feel we'll have to have this check evertime we do something on the 
code making it ugly with too many leadershipt checks, and perhaps future 
developers won't know that we would need this check.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 185 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line185>
> >
> >     Does HMS provide a guarantee that notifications have no gaps?

HMS events ID are incremented by the application and each notification is 
persisted in the same transaction as the DDL operation executed, so this should 
guarantee we have consecutive identifiers and no gaps happened. The gap I was 
talking about here is a gap when HMS clean-up older notifications if Sentry was 
shutdown for a long period of time.

Anyway, I redo this part, and made it clear to differentiate between gaps and 
out-of-sync situations.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 210 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line210>
> >
> >     Please document this function.

This does not longer exists.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 217 (patched)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778739#file1778739line217>
> >
> >     This is a strange way of doing it. Why can't we check the first 
> > notification instead? Why are we checking the size instead? What is it that 
> > you are actually checking?
> >     
> >     What if HMS notifications contain duplicate IDs (and we know that this 
> > is possible) - this will mess up the logic.

Right. I didn't thinkg about it.
I did the change.


> On July 19, 2017, 5:20 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
> > Line 217 (original)
> > <https://reviews.apache.org/r/60926/diff/4/?file=1778740#file1778740line217>
> >
> >     Is this no longer relevant or the logic is moved elsewhere?

I did a change on the HMSfollower.run() to either get a full snapshot or 
process notifications. I left this code now because the check is not happening 
anymore. Btw, I did the run() change to make the code cleaner.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60926/#review180946
-----------------------------------------------------------


On July 18, 2017, 8:23 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60926/
> -----------------------------------------------------------
> 
> (Updated July 18, 2017, 8:23 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and 
> Na Li.
> 
> 
> Bugs: sentry-1760
>     https://issues.apache.org/jira/browse/sentry-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'requestHmsSnapshot' to TRUE whenever the following 
> cases are found:
> 
> * List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a 
> while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by 
> Sentry causing a gap when retrieving notifications.
> * Latest Sentry notification ID processed is bigger than current HMS 
> notification ID.
>   This may happen when the HMS DB data was reset or restore from an old 
> snapshot causing sync issues with Sentry.
> 
> New snapshots are persisted with a new generation ID (or image number), so 
> there's is no need to clean-up older snapshots.
> 
> 
> Diffs
> -----
> 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
>  083ee4c247f96d5c87b44b9785663a2783e6644d 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java
>  05518e81f52965dc1ff102dcdd446010381b9a7a 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java
>  d67c16258c67aae997de4c0451c8b642ab05d298 
> 
> 
> Diff: https://reviews.apache.org/r/60926/diff/4/
> 
> 
> Testing
> -------
> 
> Added test cases on TestHmsFollower
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 60926: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Reply via email to