Lina, can you describe what problem have you discovered with approach 2.2?
On Mon, Jul 24, 2017 at 6:37 PM, Na Li <lina...@cloudera.com> wrote: > Sasha, > > I realize a serious issue in approach 2.2) that it does not function > correctly. So I re-iterate the approaches and their pros and cons. > > This leaves us the option 2.1) and 2.3). The question is if the performance > of approach 2.1) acceptable for now, and we can work on the long term > solution 2.3) later? > > 2.1) Find out why we have collision on a single sentry server with > synchronization on saving updates. Once find the cause, fix it. > + Follow existing approach. Does not introduce big change to the code base. > + Function correctly. There is no hole in changeID > - Need time to investigate why synchronization at application level on a > single sentry server does not prevent key collision. > - Does not scale. All updates are serialized, not much concurrency. > - Still have key collision exception and transaction failure when more than > one Sentry servers are deployed. > - Transaction failure at collision increases time to execute a transaction. > - It is confusing to customer that there are transaction failure in normal > operation. Increase support cases > > 2.2) Auto-increment changeID and send delta changes as much as possible > > The patch that achieves 5 times or more performance increase than the > current approach. > > It contains the following changes > > - revert sentry-1795 (so the changeID is auto-increment. This avoids key > collision. This is main reason we have performance improvement) > - revert sentry-1824 (no need to synchronize when changeID is > auto-increment) > - get continuous delta list from SentryStore even when the delta list > has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the hole > is > at the front of list, return full snapshot) > > + Relative small changes. Verified working with good performance when there > is no transaction failure > > + When the hole in delta list is temporary (transaction in flight), return > the continuous delta list is effective to deal with the hole. Most likely, > the hole will disappear next time HDFS requests for changes. > > - When there is transaction failure (the hole in changeID is permanent), > sends back full snapshot, which is expensive. If we can detect permanent > hole, then we don't need to send full snapshot, which is very expensive, > and may exhaust memory for big customer > > - When there is a temporary hole at the start of the list, even sending > full snapshot could skip the transaction in flight. It does not function > correctly. > > For example, the list is 1,2,3,5,6, and transaction with changeID 4 is > not done. > > First, Sentry returns 1,2,3. > > next round, 4 is still not done, so sentry gets list 5,6 with a hole at > the front of the list. Sentry sends back the full snapshot with changeID 6. > > then next time, HDFS will ask for changes after 6, and transaction with > changeID 4 will be skipped even thought it may have done. > > 2.3) Use timestamp to sort the changes > > a) use timestamp, such as MSentryPermChange.createTimeMs or > MSentryPathChange.createTimeMs to sort the entries. If there are more than > one entry having same timestamp, use changeID to break the tie. > b) HDFS asks for updates using these timestamp values instead of changeID. > Sentry server sends back changes at and after this timestamp. HDFS keeps > the list of changeIDs associated with the requesting timestamp and skip > entries already processed. This is to handle the situation when more than > one entry having same timestamp, and some are sent in previous request, and > some need to be send in next request. > c) changeID is primary key, only used to uniquely identify the entry, and > not required to be sequential nor consecutive. > d) Purge the change entries in DB using timestamp instead of changeID. For > example, keep 3 polling intervals of entries to allow HDFS getting the > changes before they are purged. > + Sentry only sends full snapshot to HDFS at the first time when HDFS > starts, and then always sends delta changes to HDFS. For path changes, if > Sentry got a new full snapshot from HMS, it sends full path snapshot. > + High concurrency. Scale well with large number of clients > - Relative big code change for API between sentry server and sentry plugin > at HDFS. > - No easy way to detect that HDFS has received and processed all updates > - If DB admin changes the DB server time backwards, this will mess up the > ordering using timestamp > > Thanks, > > Lina > > On Fri, Jul 21, 2017 at 5:53 PM, Alexander Kolbasov <ak...@cloudera.com> > wrote: > > > Lina, Thank you for sending a detailed summary! > > > > > > > On Jul 22, 2017, at 12:29 AM, Na Li <lina...@cloudera.com <mailto: > > lina...@cloudera.com>> wrote: > > > > > > Hello, > > > > > > Can you provide your feedback on this issue? We need to make decision > > soon. > > > > > > 1) Summary > > > > > > The current approach updates changeID (primary key for permission > change > > and path change) manually. In a single transaction, the code reads the > max > > changeID from DB, increases it by one, and save the value in new change > > entry. If two threads are adding changes into DB at the same time, > > collision happens (primary key does not allow multiple entries having the > > same value), and one transaction fails. Then the failed transaction goes > > through multiple re-tries. If retry count reaches max value, the > > transaction fails. > > > > > > In our stress testing on a single sentry server, with 15 clients doing > > grant/revoke operations concurrently, we saw multiple transaction > failures, > > and the exponential-back-off retry increases the latency of every > > transaction in sentry. We have serious performance issue on saving > > permission and path updates. > > > > How important is the performance issue? Do we expect high rate of > > permission updates in realistic scenarios? Do we have any idea about > > current supported rate of permission updates? > > > > Note that in our single sentry server case the code that updates the DB > is > > serialized, and, thus, we shouldn’t see any collisions at the DB level at > > all. This means that there is some other bug hiding here, still > > undiscovered. The retries and exponential backoff shouldn’t apply in the > > single-server case at all. > > > > > > > > 2) Potential solutions > > > > > > 2.1) Find out why we have collision on a single sentry server with > > synchronization on saving updates. Once find the cause, fix it. > > > > +1 on that > > > > > + Follow existing approach. Does not introduce big change to the code > > base. > > > - Need time to investigate why synchronization at application level on > a > > single sentry server does not prevent key collision. > > > - Does not scale. All updates are serialized, not much concurrency. > > > - Still have key collision exception and transaction failure when more > > than one Sentry servers are deployed. > > > - Transaction failure at collision increases time to execute a > > transaction. > > > - It is confusing to customer that there are transaction failure in > > normal operation. Increase support cases > > > > > > 2.2) CDH-transactionFailure.01-cdh5-1.5.1.patch > > > The patch that achieves 5 times or more performance increase than the > > current approach. > > > It contains the following changes > > > revert sentry-1795 (so the changeID is auto-increment. This avoids key > > collision. This is main reason we have performance improvement) > > > revert sentry-1824 (no need to synchronize when changeID is > > auto-increment) > > > get continuous delta list from SentryStore even when the delta list has > > hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the hole is at > > the front of list, return full snapshot) > > > > Please document the algorithm you are suggesting in more detail. How do > > you propose to handle synchronization between NN and Sentry server using > > auto-incremented IDs? > > > > Performance is an important consideration once we achieve correctness. It > > would be great to have *correct* solution that also performs well. > > > + Relative small changes. Verified working with good performance when > > there is no transaction failure > > > + When the hole in delta list is temporary (transaction in flight), > > return the continuous delta list is effective to deal with the hole. Most > > likely, the hole will disappear next time HDFS requests for changes. > > > > > > - When there is transaction failure (the hole in changeID is > permanent), > > sends back full snapshot, which is expensive. If we can detect permanent > > hole, then we don't need to send full snapshot, which is very expensive, > > and may exhaust memory for big customer > > > > > > > Right, sending full snapshot is extremely expensive. How do you propose > to > > detect such permanent hole? > > > 2.3) Use timestamp to sort the changes > > > > > > a) use timestamp, such as MSentryPermChange.createTimeMs or > > MSentryPathChange.createTimeMs to sort the entries. If there are more > than > > one entry having same timestamp, use changeID to break the tie. > > > > IMO we should use either timestamps or changeIDs, not both. > > > > > b) HDFS asks for updates using these timestamp values instead of > > changeID. > > > > I think that this approach may cause some updates to be lost. > > > > > > > c) changeID is primary key, only used to uniquely identify the entry, > > and not required to be sequential nor consecutive. > > > d) Purge the change entries in DB using timestamp instead of changeID. > > For example, keep 3 polling intervals of entries to allow HDFS getting > the > > changes before they are purged. > > > > Suppose that you do this purging. Later you restart NN and it asks for > all > > updates since time T, but a bunch of updates since this time were purged, > > so they never reach NN and are permanently lost. And it is impossible to > > detect that this happened. > > > > > + Sentry only sends full snapshot to HDFS at the first time when HDFS > > starts, and then always sends delta changes to HDFS > > > > What if HDFS wasn’t able to talk to Sentry for some time and in the mean > > time some entries were purged? > > > > > > > + High concurrency. Scale well with large number of clients > > > - Relative big code change for API between sentry server and sentry > > plugin at HDFS. > > > - No easy way to detect that HDFS has received and processed all > updates > > > > > > 3) Decisions > > > > > > My suggestion is that we take approach 2.2) for short term and take the > > hit of full snapshot when there is transaction failure. And take approach > > 2.3) as long term solution. > > > > We can’t make the decision since we do not have enough information about > > approach 2.2 - how does it actually work? This may be a really good > > solution if we can prove that it works. > > > > - Sasha > > > > > On Jul 22, 2017, at 12:42 AM, Na Li <lina...@cloudera.com> wrote: > > > > > > Hello, > > > > > > Can you provide your feedback on this issue? We need to make decision > > soon. > > > > > > 1) Summary > > > > > > The current approach updates changeID (primary key for permission > change > > > and path change) manually. In a single transaction, the code reads the > > max > > > changeID from DB, increases it by one, and save the value in new change > > > entry. If two threads are adding changes into DB at the same time, > > > collision happens (primary key does not allow multiple entries having > the > > > same value), and one transaction fails. Then the failed transaction > goes > > > through multiple re-tries. If retry count reaches max value, the > > > transaction fails. > > > > > > In our stress testing on a single sentry server, with 15 clients doing > > > grant/revoke operations concurrently, we saw multiple transaction > > failures, > > > and the exponential-back-off retry increases the latency of every > > > transaction in sentry. We have serious performance issue on saving > > > permission and path updates. > > > > > > 2) Potential solutions > > > > > > 2.1) Find out why we have collision on a single sentry server with > > > synchronization on saving updates. Once find the cause, fix it. > > > + Follow existing approach. Does not introduce big change to the code > > base. > > > - Need time to investigate why synchronization at application level on > a > > > single sentry server does not prevent key collision. > > > - Does not scale. All updates are serialized, not much concurrency. > > > - Still have key collision exception and transaction failure when more > > than > > > one Sentry servers are deployed. > > > - Transaction failure at collision increases time to execute a > > transaction. > > > - It is confusing to customer that there are transaction failure in > > normal > > > operation. Increase support cases > > > > > > 2.2) Auto-increment changeID and send delta changes as much as possible > > > > > > The patch that achieves 5 times or more performance increase than the > > > current approach. > > > > > > It contains the following changes > > > > > > - revert sentry-1795 (so the changeID is auto-increment. This avoids > > key > > > collision. This is main reason we have performance improvement) > > > - revert sentry-1824 (no need to synchronize when changeID is > > > auto-increment) > > > - get continuous delta list from SentryStore even when the delta list > > > has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the > > hole is > > > at the front of list, return full snapshot) > > > > > > + Relative small changes. Verified working with good performance when > > there > > > is no transaction failure > > > > > > + When the hole in delta list is temporary (transaction in flight), > > return > > > the continuous delta list is effective to deal with the hole. Most > > likely, > > > the hole will disappear next time HDFS requests for changes. > > > > > > - When there is transaction failure (the hole in changeID is > permanent), > > > sends back full snapshot, which is expensive. If we can detect > permanent > > > hole, then we don't need to send full snapshot, which is very > expensive, > > > and may exhaust memory for big customer > > > > > > 2.3) Use timestamp to sort the changes > > > > > > a) use timestamp, such as MSentryPermChange.createTimeMs or > > > MSentryPathChange.createTimeMs to sort the entries. If there are more > > than > > > one entry having same timestamp, use changeID to break the tie. > > > b) HDFS asks for updates using these timestamp values instead of > > changeID. > > > Sentry server sends back changes at and after this timestamp. HDFS > keeps > > > the list of changeIDs associated with the requesting timestamp and skip > > > entries already processed. This is to handle the situation when more > than > > > one entry having same timestamp, and some are sent in previous request, > > and > > > some need to be send in next request. > > > c) changeID is primary key, only used to uniquely identify the entry, > and > > > not required to be sequential nor consecutive. > > > d) Purge the change entries in DB using timestamp instead of changeID. > > For > > > example, keep 3 polling intervals of entries to allow HDFS getting the > > > changes before they are purged. > > > + Sentry only sends full snapshot to HDFS at the first time when HDFS > > > starts, and then always sends delta changes to HDFS > > > + High concurrency. Scale well with large number of clients > > > - Relative big code change for API between sentry server and sentry > > plugin > > > at HDFS. > > > - No easy way to detect that HDFS has received and processed all > updates > > > > > > 3) Decisions > > > > > > My suggestion is that we take approach 2.2) for short term and take the > > hit > > > of full snapshot when there is transaction failure. And take approach > > 2.3) > > > as long term solution. > > > > > > Thanks, > > > > > > Lina > > > > >