[ 
https://issues.apache.org/jira/browse/SENTRY-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308956#comment-15308956
 ] 

Colin Patrick McCabe commented on SENTRY-872:
---------------------------------------------

bq. 2. In Section "Future work", "The HDFS Plugin Should Use Update Log IDs". 
In current design, we apply deltas in the NN plugin. I do not believe we 
necessarily buffer deltas in NN, as there is no reason. So we may want to 
remove this section.

Hmm, maybe it was unclear.  This section was about avoiding buffering deltas in 
the Sentry daemon, not about buffering the deltas in the NN itself.

bq. 3. We might want to add a section about "Sentry passive with hot cache" 
which follows active versus "Sentry passive with cold cache" which warms up 
only when it acquires leadership? I think we are inclining towards former which 
can serve requests with minimal downtime, that is acquiring leadership should 
not take too long. But might be better if we state it explicitly, so that we 
evaluate the alternatives thoroughly?

We had a good discussion about this offline.  [~hahao] suggested that we might 
be able to simplify the design if we were willing to load the cache after a 
failover.  We also discussed whether the cache could be eliminated entirely.  
My general feeling is that eliminating the cache might be more work than it 
seems, but loading it on failover might be feasible.  We are looking into it.  
This would avoid the need for the update log.

bq. 4. There are some slight alternatives we might want to consider in the path 
of propagating HMS updates to Sentry and NN. In the proposed design, we will 
need to replicate HMS <obj,path> information as well as delta changes of 
it(add/delete <ob,path>) in Sentry db for the passive to follow. Other option 
is for passive to directly talk to HMS to get these deltas. If the only 
motivation for replicating this in sentry db is bringing passive upto speed, I 
think the later approach is preferable as there is no real need to replicate 
both info and deltas? But, other parameter to consider is around full update. 
That is, when Sentry restarts in the later approach, we will have to trigger a 
full update from HMS. But without a proper snapshot solution in HMS, this would 
mean we will have to lock HMS writes for this period, which means HMS is not 
available for writes for this period.

Ultimately, the HIVE-7973 API is delivering information which affects the 
global sentry state, such as that a particular Hive table has been deleted or 
moved.  It makes sense for the active sentry daemon to reflect that state in 
the DB.  The standby sentry daemons don't need to use the HIVE-7973 API since 
the DB is the source of truth for them.  This keeps them all in sync and allows 
fairy rapid failovers.

bq. 5. Would be useful to have a detailed protocol description especially 
around what happens when different services restart, and what in memory state 
does each service rely on.

Good point.  We should add more detail here.

> Uber jira for HMS HA + Sentry HA with HDFS plugin improvements
> --------------------------------------------------------------
>
>                 Key: SENTRY-872
>                 URL: https://issues.apache.org/jira/browse/SENTRY-872
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Hdfs Plugin
>    Affects Versions: 1.5.0
>            Reporter: Sravya Tirukkovalur
>            Assignee: Sravya Tirukkovalur
>             Fix For: 1.8.0
>
>         Attachments: SENTRY-872.0.patch, SENTRY-872.pdf, SENTRY-872_design.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to