Hi folks,

We are looking into some alternate design choices for HDFS Sentry sync to
be able to maintain the edit logs of both path updates(coming from HMS) and
perm updates (coming from Sentry) in a persistent state. The main
motivation is to make the HA cases more stable, as making the services as
stateless as possible would make them more fault tolerant and bringing up
multiple services can be done easily.

Right now, Sentry service buffers the edit history of path updates from HMS
and perm updates from itself in memory and serves NN, so that NN can build
ACLS based on this data for Hive based files. Some options:
1. Support edit history at the source: Both HMS and Sentry can implement a
WAL in its backend DB, so that NN can request the most recent updates
reliably from a persistent storage. Recent Hive replication support added
some support to WAL, would be good to explore if we can build on top of it.
2. Source writes the edit history to a persistent, distributed, fault
tolerant storage as HDFS/Kafka

In case 1, NN can either directly read the edit history directly from
Sentry/HMS or Sentry can act as a liaison which serves edits to NN. Both
have some advantages and disadvantages.

Let me know your thoughts and I can go into details.

Thanks!

Reply via email to