Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Na Li Sun, 06 Aug 2017 16:19:37 -0700

Hello,

Below is the latest proposed approach for permission change. Path change
will use the existing approach because there is only one thread to get HMS
changes and save into DB.


===============================
Requirements:

+ R1.) from sentry 1.8 to sentry 2.0 "do no harm", such as what's our
upgrade experience. Performance comparison
+ R2.) Is the approach scalable in terms of supporting multiple Sentry
servers and concurrent updates
+ R3.) Are the results consistent between Sentry and NN
+ R4.) Does it impact HMS-HA

Constrains from Requirements:

+ C1.) Normal operations should not generate exceptions. The performance of
Sentry 2.0 should be similar to or better than Sentry 2.0
+ C2.) The system should work reasonable well when having more Sentry
servers adding into the system and handling concurrent updates
+ C3.) The order Sentry applies the changes should be the same order NN
applies the changes, so NN will end up with the same result as Sentry and
they behave consistently

Current Approach for Permission Change:
+ The changeID is primary key and manually updated in Sentry application.
The current max(changeID) is read from DB, increased by 1, and then used as
primarykey for new change entry. If more than one threads are saving new
change entry with the same changeID, the transaction in onethread succeeds,
such transactions from other transactions fail, and go through retry with
exponential retry interval.

Benefits of the current approach
+ Satisfies R3.). The changeID is continuously increased. There is no hole
in changeID. Guaranteed consistency for both deltas and full snapshots.
+ Satisfied R4.). It should not affect HMS-HA

Issues with current approach
+ Violates R1.) because normal concurrent transaction could cause
transaction failure due to key conflict. When transaction fails,
exponential retry interval causes additional delay for transaction. When
load reaches to certain point, transaction fails to commit
+ Violates R2.) and it is not scalable. In our endurance test, we see
transaction failed to commit after max retry with two sentry servers

Proposed Approach for Permission Change:
+ The changeID is primary key and auto-incremented in DataStore. When a
transaction fails, it creates permanent hole in changeID. When a
transaction starts early but committed after transactions with larger
changeID, it forms temporary hole in changeID. Once this transaction is
committed, the temporary hole disappears. How long a temporary hole can
exist is limited by how long a transaction can be pending and still be
committed successfully.

0) read <write_timeout> from configuration and set to datastore to timeout
transactions that take longer than that value. In this way, we can limit  how
long a transaction can be pending and still be committed successfully.
<transaction
buffer time> = <buffer factor> *  <write_timeout>, and its default value is
2. This is how long we go back of time to capture temporary holes in
changeID.
1.) Change the MSentryPermChange timestamp to be assigned by DB, not from
sentry server. In this way, we have a single source of truth for the
timestamp of the permission change entries.
    1.a) Need to make sure the timestamp (using @CreateTimestamp annotation
in datanucleus) is set by DB, not be set by datanucleus using server time.
This can be done without using @CreateTimestamp if necessary by just doing
a     default value of CURRENT_TIMESTAMP(6) on the database table and not
setting anything on the object.  If the annotation can handle it, great,
but we have to make sure of the semantics.
2.) When Sentry sends changes to NN, it also includes the "current time in
DB" (Now).
3.) When NN asks for changes, it sends to Sentry the "last received
'current time in DB'" (referred as Last_Now) together with the last
processed changeID for permission and [imageID, changeID] for path changes
4.) When Sentry receives the request, it sends all entries that are newer
than the time = Last_Now - <transaction buffer time>. NN applies the
changes in increasing order of changeID

Benefits of the proposed approach:
+ Satisfies requirements R1.), R2.), R3.) and R4.). It should not affect
HMS-HA

Issues with Proposed Approach:
+ More complicated to implement than current approach is.
+ Require change protocol API between NN and Sentry to pass  "current time
in DB"
+ Need to re-apply some old updates. May cause more overhead, but should be
reasonable. We can optimize how to reapply old updates. It will be done in
next release as the change will be local to NN.

Background on R3.)
There are several types of transaction orders: TO1) The order of
transaction starting time; TO2) The order of permission SQL execution time;
TO3) The order of transaction commit time; What order affects the result?
NN should apply perm changes in the order that determines the result, so it
will behave consistently with Sentry.

Our testing result shows that the TO2) is the order that determines the
result.
Two transactions: T_1 First

End Result
Transaction Commit Order T_1: Transaction #1
T_1 First T_2 First S_1: SQL in Transaction #1

SQL Execution Order
S_1 First V_2 V_2 V_1: Value of Transaction #1
S_2 First V_1 V_1
Conclusion: SQL execution order determins the end result

The next question is "how to capture the order of permission SQL execution
order" One approach is to get the permission execution timestamp and save
it to the permission change entry. This brings two issues: a) It requires a
lot of code changes to do so. 2) It is possible to have two transactions on
the same authentication object with same timestamp even when it is on
microsecond granularity. In this case, timestamp fails to capture the SQL
execution order.

Our test shows that when permission SQL execution is followed by permission
change log execution (that is when changeID is auto-incremented), the order
of changeID is exactly the same as the permission SQL execution. As a
result, we choose to apply changes at NN by the changeID order. This should
satisfy R3.).

Two transactions: T_1 First
T_1 contains two SQL: S_1 followed by C_1
S_1: normal SQL query
C_1: log SQL change, primary key changeID is auto-increment

End Result
Log SQL Order
C_1 First C_2 First

SQL Execution Order
S_1 First C_1=1; C_2=2 C_1=1; C_2=2
S_2 First C_2=1; C_1=2 C_2=1; C_1=2
Conclusion: changeID is in the same order as the SQL execution order

Q & A
Question_1: Can we skip applying old updates? For example, the transactions
 identified by the changeID are committed in the following timely order
[changeID_1, changeID_3, changeID_2]. When NN gets [changeID_1,
changeID_3], it applies them. When it gets [changeID_1, changeID_3,
changeID_2],
can it skips re-applying [changeID: changeID_1, changeID_3], and only
applies changeID_2? That will avoid overhead of re-applying the old changes.

Answer_1: NO. The result is determined by SQL execution order and it is the
same as the changeID order. It is not the transaction commit order. The
above approach applies changes at NN in order of  [changeID_1,
changeID_3, changeID_2],
which is different from how Sentry applies the changes, which is
 [changeID_1, changeID_2, changeID_3]. This will cause inconsistency
between Sentry and NN.

Question_2: Can we hold on applies some changes and until the temporary
hole finishes, and then apply changes in changeID order at NN?
Answer_2: the temporary hole could exist for a minute. Holding applying
changes for a minute may not be desirable. If we wait for every temporary
hole for a minute, the changes could backup, and NN will be out-of-sync
from Sentry for a while.

Thanks,

Lina

On Thu, Jul 27, 2017 at 10:22 AM, Na Li <lina...@cloudera.com> wrote:

> To avoid NN out of sync with Sentry, if NN has skip multiple holes within
> a time frame, say skipped 10 holes in a day, it will request for full
> snapshot by asking changeID being 0. Those parameters are configurable
>
> On Thu, Jul 27, 2017 at 10:09 AM, Na Li <lina...@cloudera.com> wrote:
>
>> Approach 3) Sentry sends continuous changes
>> 3.1) NN asks for the oldest changeID that is not processed.
>> 3.2) Sentry server sends back the list including and above that
>> requested changeID. Sentry server sends back all continuous changes in
>> that list starting from requested changeID. If the hole is at the front of
>> the list, send back a single change right after the hole.
>> 3.3) When NN gets the single entry without requested ID, it  request for
>> the changeID for the earliest hole. It retries several times (for
>> example 3).
>>    3.3.1) If it gets the change of the hole, apply all changes up to next
>> hole.
>>    3.3.2)  if still does not get it, it skips the hole, and applies the
>> changes up to the next hole or end of the list.
>> 3.4) Repeat 3.1)
>> For example
>> If NN asks for N, and Sentry server has a list of N, N+2, N+3. NN gets N, it
>> applies N. Next time, it asks for N+1. If Sentry has N+1 by that time,
>> it sends NN the list N+1, N+2, N+3.  NN applies all of them and moves on.
>> If Sentry server still does not have N+1, it sends N+2 to NN. NN knows
>> there is a hole (not there is no updates from N+1), NN retry next round.
>> After number of retry, it assumes N+1 is a permanent hole, applies N+2, and
>> asks for N+3.
>>
>> Pros: a) No change to NN to Sentry server protocol
>>          b) No re-apply of the changes
>>          c) Reduce the duplicated changes sent to NN when there is a hole
>>          d) Can detect if there is a hole (skip it after multiple retry)
>> or there is no changes from that changeID, so keep on asking for that
>> changeID
>>
>> Cons: a) Need to configure the retry number for a hole
>>           b) Introduce delay between the changes and applying them at NN
>> when there is a hole. It could cause security issue.
>>           c) Duplicate changes are sent to NN when there is a hole.
>>           d) NN needs to maintain state.
>>
>> On Wed, Jul 26, 2017 at 5:26 PM, Na Li <lina...@cloudera.com> wrote:
>>
>>> Hi,
>>>
>>> Based on testing result, we found transactions fail to commit when running
>>> 2 sentry servers with 15 concurrent clients issuing 200 GRANTS/REVOKES
>>> each. So the current approach of manually increasing changeID has serious
>>> performance issue.
>>>
>>> We need to develop a solution that has good performance and behave
>>> correctly.
>>>
>>> I list the following approaches and please feel free to provide your
>>> feedback or add more approaches. We need to reach agreement on the solution
>>> soon.
>>>
>>> Approach 1.) NN asks for missing change
>>> 1.1) NN asks for the oldest changeID that is not processed.
>>> 1.2) Sentry server sends back the list including and above that
>>> requested changeID. Sentry server sends back all changes in that list even
>>> when there is a hole in the list.
>>> 1.3) When NN finds a hole(s), it puts following changes in a buffer, and
>>> request for the changeID for the earliest hole. It retries several times
>>> (for example 3).
>>>    1.3.1) If it gets the change of the hole, apply all changes up to
>>> next hole.
>>>    1.3.2)  if still does not get it, it skips the hole, and applies the
>>> changes up to the next hole or end of the list.
>>> 1.4) Repeat 1.1)
>>> For example
>>> If NN gets N, N+2, N+3, it applies N and keep N+2, N+3 in buffer. Next
>>> time, it asks for N+1. If Sentry has N+1, it sends NN the list N+1,
>>> N+2, N+3 (N+2, N+3 are sent twice).  NN applies all of them and moves on.
>>> If it does not get N+1, NN retry next round. After number of retry, it
>>> assumes N+1 is permanent hole, applies N+2 and N+3, and asks for N+4.
>>>
>>> Pros: a) No change to NN to Sentry server protocol
>>>          b) No re-apply of the changes
>>>
>>> Cons: a) Need to configure the retry number for a hole
>>>           b) Introduce delay between the changes and applying them at NN
>>> when there is a hole. It could cause security issue.
>>>           c) Duplicate changes are sent to NN when there is a hole.
>>>           d) NN needs to maintain state.
>>>
>>> Approach 2.) Sentry sends back old changes and NN replay
>>> 2.1) NN asks for the newest changeID that is not received.
>>> 2.2) Sentry server sends back the list including and above the requested
>>> [changeID -X]. Sentry server sends back all changes in that list even when
>>> there is a hole in the list. X is configurable parameter
>>> 2.3) NN applies all received changes .
>>> 2.4) Repeat 2.1)
>>> For example:
>>> Suppose X = 10. If NN gets N, N+2, N+3, it applies N, N+2, N+3. Next
>>> time, it asks for N+4. Sentry gets entries equal of larger than N+4 -10 =
>>> N-6. If Sentry has N+1, it sends NN the list N-6, N-5, N-4, N-3, N-2, N-1,
>>> N, N+1, N+2, N+3. NN re-applies from N-6 to N, apply N+1,and re-apply from
>>> N+2 to N+3.
>>>
>>> Pros: a) No change to NN to Sentry server protocol
>>>          b) NN keeps no state
>>>
>>> Cons: a) Need to configure the X. If X is too small, may miss changes.
>>> If X is too big, too many duplicate changes to NN and re-apply
>>>           b) Need to make sure re-apply does not cause issue.
>>>
>>> Thanks,
>>>
>>> Lina
>>>
>>>
>>> On Mon, Jul 24, 2017 at 5:19 PM, Alexander Kolbasov <ak...@cloudera.com>
>>> wrote:
>>>
>>>>
>>>> >
>>>> > Reducing the time between reading max(changeID in DB) and transaction
>>>> > commit will reduce the chance of key conflict. That is the whole
>>>> point of
>>>> > re-order the blocks.
>>>> >
>>>>
>>>>
>>>> Why would this affect anything? Whenever you read max(changeID) inside
>>>> a transaction you should get exactly the same value since we are using
>>>> repeatable-read transaction isolation level. You will get the value at the
>>>> start of the transaction, not at the time you read it.
>>>>
>>>>
>>>>
>>>
>>
>

Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Reply via email to