Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Alexander Kolbasov Mon, 24 Jul 2017 10:03:30 -0700

Lina,

can you describe what problem have you discovered with approach 2.2?



On Mon, Jul 24, 2017 at 6:37 PM, Na Li <lina...@cloudera.com> wrote:

> Sasha,
>
> I realize a serious issue in approach 2.2) that it does not function
> correctly. So I re-iterate the approaches and their pros and cons.
>
> This leaves us the option 2.1) and 2.3). The question is if the performance
> of approach 2.1) acceptable for now, and we can work on the long term
> solution 2.3) later?
>
> 2.1) Find out why we have collision on a single sentry server with
> synchronization on saving updates. Once find the cause, fix it.
> + Follow existing approach. Does not introduce big change to the code base.
> + Function correctly. There is no hole in changeID
> - Need time to investigate why synchronization at application level on a
> single sentry server does not prevent key collision.
> - Does not scale. All updates are serialized, not much concurrency.
> - Still have key collision exception and transaction failure when more than
> one Sentry servers are deployed.
> - Transaction failure at collision increases time to execute a transaction.
> - It is confusing to customer that there are transaction failure in normal
> operation. Increase support cases
>
> 2.2) Auto-increment changeID and send delta changes as much as possible
>
> The patch that achieves 5 times or more performance increase than the
> current approach.
>
> It contains the following changes
>
>    - revert sentry-1795 (so the changeID is auto-increment. This avoids key
>    collision. This is main reason we have performance improvement)
>    - revert sentry-1824 (no need to synchronize when changeID is
>    auto-increment)
>    - get continuous delta list from SentryStore even when the delta list
>    has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the hole
> is
>    at the front of list, return full snapshot)
>
> + Relative small changes. Verified working with good performance when there
> is no transaction failure
>
> + When the hole in delta list is temporary (transaction in flight), return
> the continuous delta list is effective to deal with the hole. Most likely,
> the hole will disappear next time HDFS requests for changes.
>
> - When there is transaction failure (the hole in changeID is permanent),
> sends back full snapshot, which is expensive. If we can detect permanent
> hole, then we don't need to send full snapshot, which is very expensive,
> and may exhaust memory for big customer
>
> - When there is a temporary hole at the start of the list, even sending
> full snapshot could skip the transaction in flight. It does not function
> correctly.
>
>   For example, the list is 1,2,3,5,6, and transaction with changeID 4 is
> not done.
>
>   First, Sentry returns 1,2,3.
>
>   next round, 4 is still not done, so sentry gets list 5,6 with a hole at
> the front of the list. Sentry sends back the full snapshot with changeID 6.
>
>   then next time, HDFS will ask for changes after 6, and transaction with
> changeID 4 will be skipped even thought it may have done.
>
> 2.3) Use timestamp to sort the changes
>
> a) use timestamp, such as MSentryPermChange.createTimeMs or
> MSentryPathChange.createTimeMs to sort the entries. If there are more than
> one entry having same timestamp, use changeID to break the tie.
> b) HDFS asks for updates using these timestamp values instead of changeID.
> Sentry server sends back changes at and after this timestamp. HDFS keeps
> the list of changeIDs associated with the requesting timestamp and skip
> entries already processed. This is to handle the situation when more than
> one entry having same timestamp, and some are sent in previous request, and
> some need to be send in next request.
> c) changeID is primary key, only used to uniquely identify the entry, and
> not required to be sequential nor consecutive.
> d) Purge the change entries in DB using timestamp instead of changeID. For
> example, keep 3 polling intervals of entries to allow HDFS getting the
> changes before they are purged.
> + Sentry only sends full snapshot to HDFS at the first time when HDFS
> starts, and then always sends delta changes to HDFS. For path changes, if
> Sentry got a new full snapshot from HMS, it sends full path snapshot.
> + High concurrency. Scale well with large number of clients
> - Relative big code change for API between sentry server and sentry plugin
> at HDFS.
> - No easy way to detect that HDFS has received and processed all updates
> - If DB admin changes the DB server time backwards, this will mess up the
> ordering using timestamp
>
> Thanks,
>
> Lina
>
> On Fri, Jul 21, 2017 at 5:53 PM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
>
> > Lina, Thank you for sending a detailed summary!
> >
> >
> > > On Jul 22, 2017, at 12:29 AM, Na Li <lina...@cloudera.com <mailto:
> > lina...@cloudera.com>> wrote:
> > >
> > > Hello,
> > >
> > > Can you provide your feedback on this issue? We need to make decision
> > soon.
> > >
> > > 1) Summary
> > >
> > > The current approach updates changeID (primary key for permission
> change
> > and path change) manually. In a single transaction, the code reads the
> max
> > changeID from DB, increases it by one, and save the value in new change
> > entry. If two threads are adding changes into DB at the same time,
> > collision happens (primary key does not allow multiple entries having the
> > same value), and one transaction fails. Then the failed transaction goes
> > through multiple re-tries. If retry count reaches max value, the
> > transaction fails.
> > >
> > > In our stress testing on a single sentry server, with 15 clients doing
> > grant/revoke operations concurrently, we saw multiple transaction
> failures,
> > and the exponential-back-off retry increases the latency of every
> > transaction in sentry. We have serious performance issue on saving
> > permission and path updates.
> >
> > How important is the performance issue? Do we expect high rate of
> > permission updates in realistic scenarios? Do we have any idea about
> > current supported rate of permission updates?
> >
> > Note that in our single sentry server case the code that updates the DB
> is
> > serialized, and, thus, we shouldn’t see any collisions at the DB level at
> > all. This means that there is some other bug hiding here, still
> > undiscovered. The retries and exponential backoff shouldn’t apply in the
> > single-server case at all.
> >
> > >
> > > 2) Potential solutions
> > >
> > > 2.1) Find out why we have collision on a single sentry server with
> > synchronization on saving updates. Once find the cause, fix it.
> >
> > +1 on that
> >
> > > + Follow existing approach. Does not introduce big change to the code
> > base.
> > > - Need time to investigate why synchronization at application level on
> a
> > single sentry server does not prevent key collision.
> > > - Does not scale. All updates are serialized, not much concurrency.
> > > - Still have key collision exception and transaction failure when more
> > than one Sentry servers are deployed.
> > > - Transaction failure at collision increases time to execute a
> > transaction.
> > > - It is confusing to customer that there are transaction failure in
> > normal operation. Increase support cases
> > >
> > > 2.2) CDH-transactionFailure.01-cdh5-1.5.1.patch
> > > The patch that achieves 5 times or more performance increase than the
> > current approach.
> > > It contains the following changes
> > > revert sentry-1795 (so the changeID is auto-increment. This avoids key
> > collision. This is main reason we have performance improvement)
> > > revert sentry-1824 (no need to synchronize when changeID is
> > auto-increment)
> > > get continuous delta list from SentryStore even when the delta list has
> > hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the hole is at
> > the front of list, return full snapshot)
> >
> > Please document the algorithm you are suggesting in more detail. How do
> > you propose to handle synchronization between NN and Sentry server using
> > auto-incremented IDs?
> >
> > Performance is an important consideration once we achieve correctness. It
> > would be great to have *correct* solution that also performs well.
> > > + Relative small changes. Verified working with good performance when
> > there is no transaction failure
> > > + When the hole in delta list is temporary (transaction in flight),
> > return the continuous delta list is effective to deal with the hole. Most
> > likely, the hole will disappear next time HDFS requests for changes.
> > >
> > > - When there is transaction failure (the hole in changeID is
> permanent),
> > sends back full snapshot, which is expensive. If we can detect permanent
> > hole, then we don't need to send full snapshot, which is very expensive,
> > and may exhaust memory for big customer
> > >
> >
> > Right, sending full snapshot is extremely expensive. How do you propose
> to
> > detect such permanent hole?
> > > 2.3) Use timestamp to sort the changes
> > >
> > > a) use timestamp, such as MSentryPermChange.createTimeMs or
> > MSentryPathChange.createTimeMs to sort the entries. If there are more
> than
> > one entry having same timestamp, use changeID to break the tie.
> >
> > IMO we should use either timestamps or changeIDs, not both.
> >
> > > b) HDFS asks for updates using these timestamp values instead of
> > changeID.
> >
> > I think that this approach may cause some updates to be lost.
> >
> >
> > > c) changeID is primary key, only used to uniquely identify the entry,
> > and not required to be sequential nor consecutive.
> > > d) Purge the change entries in DB using timestamp instead of changeID.
> > For example, keep 3 polling intervals of entries to allow HDFS getting
> the
> > changes before they are purged.
> >
> > Suppose that you do this purging. Later you restart NN and it asks for
> all
> > updates since time T, but a bunch of updates since this time were purged,
> > so they never reach NN and are permanently lost. And it is impossible to
> > detect that this happened.
> >
> > > + Sentry only sends full snapshot to HDFS at the first time when HDFS
> > starts, and then always sends delta changes to HDFS
> >
> > What if HDFS wasn’t able to talk to Sentry for some time and in the mean
> > time some entries were purged?
> >
> >
> > > + High concurrency. Scale well with large number of clients
> > > - Relative big code change for API between sentry server and sentry
> > plugin at HDFS.
> > > - No easy way to detect that HDFS has received and processed all
> updates
> > >
> > > 3) Decisions
> > >
> > > My suggestion is that we take approach 2.2) for short term and take the
> > hit of full snapshot when there is transaction failure. And take approach
> > 2.3) as long term solution.
> >
> > We can’t make the decision since we do not have enough information about
> > approach 2.2 - how does it actually work? This may be a really good
> > solution if we can prove that it works.
> >
> > - Sasha
> >
> > > On Jul 22, 2017, at 12:42 AM, Na Li <lina...@cloudera.com> wrote:
> > >
> > > Hello,
> > >
> > > Can you provide your feedback on this issue? We need to make decision
> > soon.
> > >
> > > 1) Summary
> > >
> > > The current approach updates changeID (primary key for permission
> change
> > > and path change) manually. In a single transaction, the code reads the
> > max
> > > changeID from DB, increases it by one, and save the value in new change
> > > entry. If two threads are adding changes into DB at the same time,
> > > collision happens (primary key does not allow multiple entries having
> the
> > > same value), and one transaction fails. Then the failed transaction
> goes
> > > through multiple re-tries. If retry count reaches max value, the
> > > transaction fails.
> > >
> > > In our stress testing on a single sentry server, with 15 clients doing
> > > grant/revoke operations concurrently, we saw multiple transaction
> > failures,
> > > and the exponential-back-off retry increases the latency of every
> > > transaction in sentry. We have serious performance issue on saving
> > > permission and path updates.
> > >
> > > 2) Potential solutions
> > >
> > > 2.1) Find out why we have collision on a single sentry server with
> > > synchronization on saving updates. Once find the cause, fix it.
> > > + Follow existing approach. Does not introduce big change to the code
> > base.
> > > - Need time to investigate why synchronization at application level on
> a
> > > single sentry server does not prevent key collision.
> > > - Does not scale. All updates are serialized, not much concurrency.
> > > - Still have key collision exception and transaction failure when more
> > than
> > > one Sentry servers are deployed.
> > > - Transaction failure at collision increases time to execute a
> > transaction.
> > > - It is confusing to customer that there are transaction failure in
> > normal
> > > operation. Increase support cases
> > >
> > > 2.2) Auto-increment changeID and send delta changes as much as possible
> > >
> > > The patch that achieves 5 times or more performance increase than the
> > > current approach.
> > >
> > > It contains the following changes
> > >
> > >   - revert sentry-1795 (so the changeID is auto-increment. This avoids
> > key
> > >   collision. This is main reason we have performance improvement)
> > >   - revert sentry-1824 (no need to synchronize when changeID is
> > >   auto-increment)
> > >   - get continuous delta list from SentryStore even when the delta list
> > >   has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the
> > hole is
> > >   at the front of list, return full snapshot)
> > >
> > > + Relative small changes. Verified working with good performance when
> > there
> > > is no transaction failure
> > >
> > > + When the hole in delta list is temporary (transaction in flight),
> > return
> > > the continuous delta list is effective to deal with the hole. Most
> > likely,
> > > the hole will disappear next time HDFS requests for changes.
> > >
> > > - When there is transaction failure (the hole in changeID is
> permanent),
> > > sends back full snapshot, which is expensive. If we can detect
> permanent
> > > hole, then we don't need to send full snapshot, which is very
> expensive,
> > > and may exhaust memory for big customer
> > >
> > > 2.3) Use timestamp to sort the changes
> > >
> > > a) use timestamp, such as MSentryPermChange.createTimeMs or
> > > MSentryPathChange.createTimeMs to sort the entries. If there are more
> > than
> > > one entry having same timestamp, use changeID to break the tie.
> > > b) HDFS asks for updates using these timestamp values instead of
> > changeID.
> > > Sentry server sends back changes at and after this timestamp. HDFS
> keeps
> > > the list of changeIDs associated with the requesting timestamp and skip
> > > entries already processed. This is to handle the situation when more
> than
> > > one entry having same timestamp, and some are sent in previous request,
> > and
> > > some need to be send in next request.
> > > c) changeID is primary key, only used to uniquely identify the entry,
> and
> > > not required to be sequential nor consecutive.
> > > d) Purge the change entries in DB using timestamp instead of changeID.
> > For
> > > example, keep 3 polling intervals of entries to allow HDFS getting the
> > > changes before they are purged.
> > > + Sentry only sends full snapshot to HDFS at the first time when HDFS
> > > starts, and then always sends delta changes to HDFS
> > > + High concurrency. Scale well with large number of clients
> > > - Relative big code change for API between sentry server and sentry
> > plugin
> > > at HDFS.
> > > - No easy way to detect that HDFS has received and processed all
> updates
> > >
> > > 3) Decisions
> > >
> > > My suggestion is that we take approach 2.2) for short term and take the
> > hit
> > > of full snapshot when there is transaction failure. And take approach
> > 2.3)
> > > as long term solution.
> > >
> > > Thanks,
> > >
> > > Lina
> >
> >
>

Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Reply via email to