Hello, Below is the latest proposed approach for permission change. Path change will use the existing approach because there is only one thread to get HMS changes and save into DB.
=============================== Requirements: + R1.) from sentry 1.8 to sentry 2.0 "do no harm", such as what's our upgrade experience. Performance comparison + R2.) Is the approach scalable in terms of supporting multiple Sentry servers and concurrent updates + R3.) Are the results consistent between Sentry and NN + R4.) Does it impact HMS-HA Constrains from Requirements: + C1.) Normal operations should not generate exceptions. The performance of Sentry 2.0 should be similar to or better than Sentry 2.0 + C2.) The system should work reasonable well when having more Sentry servers adding into the system and handling concurrent updates + C3.) The order Sentry applies the changes should be the same order NN applies the changes, so NN will end up with the same result as Sentry and they behave consistently Current Approach for Permission Change: + The changeID is primary key and manually updated in Sentry application. The current max(changeID) is read from DB, increased by 1, and then used as primarykey for new change entry. If more than one threads are saving new change entry with the same changeID, the transaction in onethread succeeds, such transactions from other transactions fail, and go through retry with exponential retry interval. Benefits of the current approach + Satisfies R3.). The changeID is continuously increased. There is no hole in changeID. Guaranteed consistency for both deltas and full snapshots. + Satisfied R4.). It should not affect HMS-HA Issues with current approach + Violates R1.) because normal concurrent transaction could cause transaction failure due to key conflict. When transaction fails, exponential retry interval causes additional delay for transaction. When load reaches to certain point, transaction fails to commit + Violates R2.) and it is not scalable. In our endurance test, we see transaction failed to commit after max retry with two sentry servers Proposed Approach for Permission Change: + The changeID is primary key and auto-incremented in DataStore. When a transaction fails, it creates permanent hole in changeID. When a transaction starts early but committed after transactions with larger changeID, it forms temporary hole in changeID. Once this transaction is committed, the temporary hole disappears. How long a temporary hole can exist is limited by how long a transaction can be pending and still be committed successfully. 0) read <write_timeout> from configuration and set to datastore to timeout transactions that take longer than that value. In this way, we can limit how long a transaction can be pending and still be committed successfully. <transaction buffer time> = <buffer factor> * <write_timeout>, and its default value is 2. This is how long we go back of time to capture temporary holes in changeID. 1.) Change the MSentryPermChange timestamp to be assigned by DB, not from sentry server. In this way, we have a single source of truth for the timestamp of the permission change entries. 1.a) Need to make sure the timestamp (using @CreateTimestamp annotation in datanucleus) is set by DB, not be set by datanucleus using server time. This can be done without using @CreateTimestamp if necessary by just doing a default value of CURRENT_TIMESTAMP(6) on the database table and not setting anything on the object. If the annotation can handle it, great, but we have to make sure of the semantics. 2.) When Sentry sends changes to NN, it also includes the "current time in DB" (Now). 3.) When NN asks for changes, it sends to Sentry the "last received 'current time in DB'" (referred as Last_Now) together with the last processed changeID for permission and [imageID, changeID] for path changes 4.) When Sentry receives the request, it sends all entries that are newer than the time = Last_Now - <transaction buffer time>. NN applies the changes in increasing order of changeID Benefits of the proposed approach: + Satisfies requirements R1.), R2.), R3.) and R4.). It should not affect HMS-HA Issues with Proposed Approach: + More complicated to implement than current approach is. + Require change protocol API between NN and Sentry to pass "current time in DB" + Need to re-apply some old updates. May cause more overhead, but should be reasonable. We can optimize how to reapply old updates. It will be done in next release as the change will be local to NN. Background on R3.) There are several types of transaction orders: TO1) The order of transaction starting time; TO2) The order of permission SQL execution time; TO3) The order of transaction commit time; What order affects the result? NN should apply perm changes in the order that determines the result, so it will behave consistently with Sentry. Our testing result shows that the TO2) is the order that determines the result. Two transactions: T_1 First End Result Transaction Commit Order T_1: Transaction #1 T_1 First T_2 First S_1: SQL in Transaction #1 SQL Execution Order S_1 First V_2 V_2 V_1: Value of Transaction #1 S_2 First V_1 V_1 Conclusion: SQL execution order determins the end result The next question is "how to capture the order of permission SQL execution order" One approach is to get the permission execution timestamp and save it to the permission change entry. This brings two issues: a) It requires a lot of code changes to do so. 2) It is possible to have two transactions on the same authentication object with same timestamp even when it is on microsecond granularity. In this case, timestamp fails to capture the SQL execution order. Our test shows that when permission SQL execution is followed by permission change log execution (that is when changeID is auto-incremented), the order of changeID is exactly the same as the permission SQL execution. As a result, we choose to apply changes at NN by the changeID order. This should satisfy R3.). Two transactions: T_1 First T_1 contains two SQL: S_1 followed by C_1 S_1: normal SQL query C_1: log SQL change, primary key changeID is auto-increment End Result Log SQL Order C_1 First C_2 First SQL Execution Order S_1 First C_1=1; C_2=2 C_1=1; C_2=2 S_2 First C_2=1; C_1=2 C_2=1; C_1=2 Conclusion: changeID is in the same order as the SQL execution order Q & A Question_1: Can we skip applying old updates? For example, the transactions identified by the changeID are committed in the following timely order [changeID_1, changeID_3, changeID_2]. When NN gets [changeID_1, changeID_3], it applies them. When it gets [changeID_1, changeID_3, changeID_2], can it skips re-applying [changeID: changeID_1, changeID_3], and only applies changeID_2? That will avoid overhead of re-applying the old changes. Answer_1: NO. The result is determined by SQL execution order and it is the same as the changeID order. It is not the transaction commit order. The above approach applies changes at NN in order of [changeID_1, changeID_3, changeID_2], which is different from how Sentry applies the changes, which is [changeID_1, changeID_2, changeID_3]. This will cause inconsistency between Sentry and NN. Question_2: Can we hold on applies some changes and until the temporary hole finishes, and then apply changes in changeID order at NN? Answer_2: the temporary hole could exist for a minute. Holding applying changes for a minute may not be desirable. If we wait for every temporary hole for a minute, the changes could backup, and NN will be out-of-sync from Sentry for a while. Thanks, Lina On Thu, Jul 27, 2017 at 10:22 AM, Na Li <lina...@cloudera.com> wrote: > To avoid NN out of sync with Sentry, if NN has skip multiple holes within > a time frame, say skipped 10 holes in a day, it will request for full > snapshot by asking changeID being 0. Those parameters are configurable > > On Thu, Jul 27, 2017 at 10:09 AM, Na Li <lina...@cloudera.com> wrote: > >> Approach 3) Sentry sends continuous changes >> 3.1) NN asks for the oldest changeID that is not processed. >> 3.2) Sentry server sends back the list including and above that >> requested changeID. Sentry server sends back all continuous changes in >> that list starting from requested changeID. If the hole is at the front of >> the list, send back a single change right after the hole. >> 3.3) When NN gets the single entry without requested ID, it request for >> the changeID for the earliest hole. It retries several times (for >> example 3). >> 3.3.1) If it gets the change of the hole, apply all changes up to next >> hole. >> 3.3.2) if still does not get it, it skips the hole, and applies the >> changes up to the next hole or end of the list. >> 3.4) Repeat 3.1) >> For example >> If NN asks for N, and Sentry server has a list of N, N+2, N+3. NN gets N, it >> applies N. Next time, it asks for N+1. If Sentry has N+1 by that time, >> it sends NN the list N+1, N+2, N+3. NN applies all of them and moves on. >> If Sentry server still does not have N+1, it sends N+2 to NN. NN knows >> there is a hole (not there is no updates from N+1), NN retry next round. >> After number of retry, it assumes N+1 is a permanent hole, applies N+2, and >> asks for N+3. >> >> Pros: a) No change to NN to Sentry server protocol >> b) No re-apply of the changes >> c) Reduce the duplicated changes sent to NN when there is a hole >> d) Can detect if there is a hole (skip it after multiple retry) >> or there is no changes from that changeID, so keep on asking for that >> changeID >> >> Cons: a) Need to configure the retry number for a hole >> b) Introduce delay between the changes and applying them at NN >> when there is a hole. It could cause security issue. >> c) Duplicate changes are sent to NN when there is a hole. >> d) NN needs to maintain state. >> >> On Wed, Jul 26, 2017 at 5:26 PM, Na Li <lina...@cloudera.com> wrote: >> >>> Hi, >>> >>> Based on testing result, we found transactions fail to commit when running >>> 2 sentry servers with 15 concurrent clients issuing 200 GRANTS/REVOKES >>> each. So the current approach of manually increasing changeID has serious >>> performance issue. >>> >>> We need to develop a solution that has good performance and behave >>> correctly. >>> >>> I list the following approaches and please feel free to provide your >>> feedback or add more approaches. We need to reach agreement on the solution >>> soon. >>> >>> Approach 1.) NN asks for missing change >>> 1.1) NN asks for the oldest changeID that is not processed. >>> 1.2) Sentry server sends back the list including and above that >>> requested changeID. Sentry server sends back all changes in that list even >>> when there is a hole in the list. >>> 1.3) When NN finds a hole(s), it puts following changes in a buffer, and >>> request for the changeID for the earliest hole. It retries several times >>> (for example 3). >>> 1.3.1) If it gets the change of the hole, apply all changes up to >>> next hole. >>> 1.3.2) if still does not get it, it skips the hole, and applies the >>> changes up to the next hole or end of the list. >>> 1.4) Repeat 1.1) >>> For example >>> If NN gets N, N+2, N+3, it applies N and keep N+2, N+3 in buffer. Next >>> time, it asks for N+1. If Sentry has N+1, it sends NN the list N+1, >>> N+2, N+3 (N+2, N+3 are sent twice). NN applies all of them and moves on. >>> If it does not get N+1, NN retry next round. After number of retry, it >>> assumes N+1 is permanent hole, applies N+2 and N+3, and asks for N+4. >>> >>> Pros: a) No change to NN to Sentry server protocol >>> b) No re-apply of the changes >>> >>> Cons: a) Need to configure the retry number for a hole >>> b) Introduce delay between the changes and applying them at NN >>> when there is a hole. It could cause security issue. >>> c) Duplicate changes are sent to NN when there is a hole. >>> d) NN needs to maintain state. >>> >>> Approach 2.) Sentry sends back old changes and NN replay >>> 2.1) NN asks for the newest changeID that is not received. >>> 2.2) Sentry server sends back the list including and above the requested >>> [changeID -X]. Sentry server sends back all changes in that list even when >>> there is a hole in the list. X is configurable parameter >>> 2.3) NN applies all received changes . >>> 2.4) Repeat 2.1) >>> For example: >>> Suppose X = 10. If NN gets N, N+2, N+3, it applies N, N+2, N+3. Next >>> time, it asks for N+4. Sentry gets entries equal of larger than N+4 -10 = >>> N-6. If Sentry has N+1, it sends NN the list N-6, N-5, N-4, N-3, N-2, N-1, >>> N, N+1, N+2, N+3. NN re-applies from N-6 to N, apply N+1,and re-apply from >>> N+2 to N+3. >>> >>> Pros: a) No change to NN to Sentry server protocol >>> b) NN keeps no state >>> >>> Cons: a) Need to configure the X. If X is too small, may miss changes. >>> If X is too big, too many duplicate changes to NN and re-apply >>> b) Need to make sure re-apply does not cause issue. >>> >>> Thanks, >>> >>> Lina >>> >>> >>> On Mon, Jul 24, 2017 at 5:19 PM, Alexander Kolbasov <ak...@cloudera.com> >>> wrote: >>> >>>> >>>> > >>>> > Reducing the time between reading max(changeID in DB) and transaction >>>> > commit will reduce the chance of key conflict. That is the whole >>>> point of >>>> > re-order the blocks. >>>> > >>>> >>>> >>>> Why would this affect anything? Whenever you read max(changeID) inside >>>> a transaction you should get exactly the same value since we are using >>>> repeatable-read transaction isolation level. You will get the value at the >>>> start of the transaction, not at the time you read it. >>>> >>>> >>>> >>> >> >