Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Alexander Kolbasov Mon, 24 Jul 2017 15:07:28 -0700

Lia,

I don’t understand why it would be any different, but there is no harm in 
trying.


> On Jul 24, 2017, at 11:50 PM, Na Li <lina...@cloudera.com> wrote:
> 
> Sasha,
> 
> "How do you know that it improves performance? Do you see the improvement
> in your experiments? If you do see improvements, let’s do the change."
> [Lina] I have not run the test. Vamsee will run the test on cluster 30
> minutes later today for current approach with re-order transactions. So we
> can see.
> 
> "Why would it fail? Most of our transactions fail currently because of the
> conflict on the primary key for delta updates which IMO is happening at
> commit time."
> [Lina] I know. We have three types of failure: whole transaction failure
> (you mentioned), delta transaction block failure, and TransactionBlock
> failure. I am looking at additional individual transaction block failure,
> besides key conflict failure of the whole transaction.



> 
> "Hmm, I don’t get that. I think changeID is modified in the same
> transaction, so the whole thing should complete."
> [Lina] I think changeID value is set in DeltaTransactionBlock after getting
> max(changeID in DB). In my proposed approach, the order of events is
> 0) transaction starts
> 1) PM executes TransactionBlock,
> 2) PM executes DeltaTransactionBlock (where changeID value is set to the
> MSentryPathChange instance or MSentryPermChange instance)
> 3) It is saved into DB at transaction commit.
> 
> In current approach, the order of events is
> 
> 0) transaction starts
> 1) PM executes DeltaTransactionBlock (where changeID value is set to the
> MSentryPathChange instance or MSentryPermChange instance)
> 2) PM executes TransactionBlock,
> 3) It is saved into DB at transaction commit.
> 
> Reducing the time between reading max(changeID in DB) and transaction
> commit will reduce the chance of key conflict. That is the whole point of
> re-order the blocks.
> 
> 
> 
> Thanks,
> 
> Lina
> 
> On Mon, Jul 24, 2017 at 3:52 PM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
> 
>> Lina,
>> 
>>> On Jul 24, 2017, at 10:20 PM, Na Li <lina...@cloudera.com> wrote:
>>> 
>>> Sasha,
>>> 
>>> I found the following approach that could improve the performance of the
>>> current approach and second approach a lot with minor code change
>> 
>> How do you know that it improves performance? Do you see the improvement
>> in your experiments? If you do see improvements, let’s do the change.
>> 
>>> 
>>> Right now, the execution order of the transaction blocks in a
>> transaction is
>>> 1. Delta transaction block to save the perm change and path change
>>> 2. The actual transaction
>>> 
>>> We can change the execution order to be
>>> 1. The actual transaction
>>> 2. Delta transaction block to save the perm change and path change
>>> 
>>> The benefit of this update is:
>>> Actual transaction has chance to fail and execution time may vary a lot.
>> 
>> Why would it fail? Most of our transactions fail currently because of the
>> conflict on the primary key for delta updates which IMO is happening at
>> commit time.
>> 
>>> Delta transaction has very low chance to fail and the execution time is
>>> near constant. As a result, the changeID is set after actual transaction
>> is
>>> done.
>> 
>> Hmm, I don’t get that. I think changeID is modified in the same
>> transaction, so the whole thing should complete.
>> 
>>> 
>>> A. If it is applied to the approach 2.1) (manually increase changeID,
>> read
>>> current max value and increase by 1) , the time window exposed for key
>>> collision is small (the delta transaction execution time). Before, the
>> time
>>> windows is the sum of delta transaction and actual transaction. So key
>>> collision should happen much less.
>> 
>> Can you run some experiments and see whether this is the case or not? This
>> may or may not have some impact - difficult to tell.
>> 
>> Thanks,
>> 
>> - Sasha
>> 
>>> 
>>> B. If it is applied to the approach 2.2) (auto-increase the change ID),
>>> actual transaction failure does not create permanent hole in changeID.
>> Near
>>> constant execution time of delta transaction makes it rare to have
>>> temporary hole. Without permanent hole, if there is a hole, it will be
>>> temporary hole. We should always wait for the hole to be filled and
>> always
>>> send delta changes. If the hole is not filled, the missing transaction is
>>> likely purged without received by HDFS.
>>> 
>>> Thanks,
>>> 
>>> Lina
>>> 
>>> 
>>> On Mon, Jul 24, 2017 at 1:03 PM, Na Li <lina...@cloudera.com> wrote:
>>> 
>>>> Approach 2.2, it uses auto-increment of changeID. When there is a
>>>> temporary hole at the start of the list, even sending full snapshot
>> could
>>>> skip the transaction in flight. It does not function correctly.
>>>> 
>>>> For example, the list is 1,2,3,5,6, and transaction with changeID 4 is
>>>> not done.
>>>> 
>>>> First, Sentry returns 1,2,3.
>>>> 
>>>> next round, 4 is still not done, so sentry gets list 5,6 with a hole at
>>>> the front of the list. Sentry sends back the full snapshot with
>> changeID 6.
>>>> 
>>>> then next time, HDFS will ask for changes after 6, and transaction with
>>>> changeID 4 will be skipped even thought it may have done.
>>>> 
>>>> On Mon, Jul 24, 2017 at 12:02 PM, Alexander Kolbasov <
>> ak...@cloudera.com>
>>>> wrote:
>>>> 
>>>>> Lina,
>>>>> 
>>>>> can you describe what problem have you discovered with approach 2.2?
>>>>> 
>>>>> 
>>>>> On Mon, Jul 24, 2017 at 6:37 PM, Na Li <lina...@cloudera.com> wrote:
>>>>> 
>>>>>> Sasha,
>>>>>> 
>>>>>> I realize a serious issue in approach 2.2) that it does not function
>>>>>> correctly. So I re-iterate the approaches and their pros and cons.
>>>>>> 
>>>>>> This leaves us the option 2.1) and 2.3). The question is if the
>>>>> performance
>>>>>> of approach 2.1) acceptable for now, and we can work on the long term
>>>>>> solution 2.3) later?
>>>>>> 
>>>>>> 2.1) Find out why we have collision on a single sentry server with
>>>>>> synchronization on saving updates. Once find the cause, fix it.
>>>>>> + Follow existing approach. Does not introduce big change to the code
>>>>> base.
>>>>>> + Function correctly. There is no hole in changeID
>>>>>> - Need time to investigate why synchronization at application level
>> on a
>>>>>> single sentry server does not prevent key collision.
>>>>>> - Does not scale. All updates are serialized, not much concurrency.
>>>>>> - Still have key collision exception and transaction failure when more
>>>>> than
>>>>>> one Sentry servers are deployed.
>>>>>> - Transaction failure at collision increases time to execute a
>>>>> transaction.
>>>>>> - It is confusing to customer that there are transaction failure in
>>>>> normal
>>>>>> operation. Increase support cases
>>>>>> 
>>>>>> 2.2) Auto-increment changeID and send delta changes as much as
>> possible
>>>>>> 
>>>>>> The patch that achieves 5 times or more performance increase than the
>>>>>> current approach.
>>>>>> 
>>>>>> It contains the following changes
>>>>>> 
>>>>>>  - revert sentry-1795 (so the changeID is auto-increment. This avoids
>>>>> key
>>>>>>  collision. This is main reason we have performance improvement)
>>>>>>  - revert sentry-1824 (no need to synchronize when changeID is
>>>>>>  auto-increment)
>>>>>>  - get continuous delta list from SentryStore even when the delta
>> list
>>>>>>  has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the
>>>>> hole
>>>>>> is
>>>>>>  at the front of list, return full snapshot)
>>>>>> 
>>>>>> + Relative small changes. Verified working with good performance when
>>>>> there
>>>>>> is no transaction failure
>>>>>> 
>>>>>> + When the hole in delta list is temporary (transaction in flight),
>>>>> return
>>>>>> the continuous delta list is effective to deal with the hole. Most
>>>>> likely,
>>>>>> the hole will disappear next time HDFS requests for changes.
>>>>>> 
>>>>>> - When there is transaction failure (the hole in changeID is
>> permanent),
>>>>>> sends back full snapshot, which is expensive. If we can detect
>> permanent
>>>>>> hole, then we don't need to send full snapshot, which is very
>> expensive,
>>>>>> and may exhaust memory for big customer
>>>>>> 
>>>>>> - When there is a temporary hole at the start of the list, even
>> sending
>>>>>> full snapshot could skip the transaction in flight. It does not
>> function
>>>>>> correctly.
>>>>>> 
>>>>>> For example, the list is 1,2,3,5,6, and transaction with changeID 4
>> is
>>>>>> not done.
>>>>>> 
>>>>>> First, Sentry returns 1,2,3.
>>>>>> 
>>>>>> next round, 4 is still not done, so sentry gets list 5,6 with a hole
>>>>> at
>>>>>> the front of the list. Sentry sends back the full snapshot with
>>>>> changeID 6.
>>>>>> 
>>>>>> then next time, HDFS will ask for changes after 6, and transaction
>>>>> with
>>>>>> changeID 4 will be skipped even thought it may have done.
>>>>>> 
>>>>>> 2.3) Use timestamp to sort the changes
>>>>>> 
>>>>>> a) use timestamp, such as MSentryPermChange.createTimeMs or
>>>>>> MSentryPathChange.createTimeMs to sort the entries. If there are more
>>>>> than
>>>>>> one entry having same timestamp, use changeID to break the tie.
>>>>>> b) HDFS asks for updates using these timestamp values instead of
>>>>> changeID.
>>>>>> Sentry server sends back changes at and after this timestamp. HDFS
>> keeps
>>>>>> the list of changeIDs associated with the requesting timestamp and
>> skip
>>>>>> entries already processed. This is to handle the situation when more
>>>>> than
>>>>>> one entry having same timestamp, and some are sent in previous
>> request,
>>>>> and
>>>>>> some need to be send in next request.
>>>>>> c) changeID is primary key, only used to uniquely identify the entry,
>>>>> and
>>>>>> not required to be sequential nor consecutive.
>>>>>> d) Purge the change entries in DB using timestamp instead of changeID.
>>>>> For
>>>>>> example, keep 3 polling intervals of entries to allow HDFS getting the
>>>>>> changes before they are purged.
>>>>>> + Sentry only sends full snapshot to HDFS at the first time when HDFS
>>>>>> starts, and then always sends delta changes to HDFS. For path changes,
>>>>> if
>>>>>> Sentry got a new full snapshot from HMS, it sends full path snapshot.
>>>>>> + High concurrency. Scale well with large number of clients
>>>>>> - Relative big code change for API between sentry server and sentry
>>>>> plugin
>>>>>> at HDFS.
>>>>>> - No easy way to detect that HDFS has received and processed all
>> updates
>>>>>> - If DB admin changes the DB server time backwards, this will mess up
>>>>> the
>>>>>> ordering using timestamp
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Lina
>>>>>> 
>>>>>> On Fri, Jul 21, 2017 at 5:53 PM, Alexander Kolbasov <
>> ak...@cloudera.com
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Lina, Thank you for sending a detailed summary!
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 22, 2017, at 12:29 AM, Na Li <lina...@cloudera.com <mailto:
>>>>>>> lina...@cloudera.com>> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Can you provide your feedback on this issue? We need to make
>>>>> decision
>>>>>>> soon.
>>>>>>>> 
>>>>>>>> 1) Summary
>>>>>>>> 
>>>>>>>> The current approach updates changeID (primary key for permission
>>>>>> change
>>>>>>> and path change) manually. In a single transaction, the code reads
>> the
>>>>>> max
>>>>>>> changeID from DB, increases it by one, and save the value in new
>>>>> change
>>>>>>> entry. If two threads are adding changes into DB at the same time,
>>>>>>> collision happens (primary key does not allow multiple entries having
>>>>> the
>>>>>>> same value), and one transaction fails. Then the failed transaction
>>>>> goes
>>>>>>> through multiple re-tries. If retry count reaches max value, the
>>>>>>> transaction fails.
>>>>>>>> 
>>>>>>>> In our stress testing on a single sentry server, with 15 clients
>>>>> doing
>>>>>>> grant/revoke operations concurrently, we saw multiple transaction
>>>>>> failures,
>>>>>>> and the exponential-back-off retry increases the latency of every
>>>>>>> transaction in sentry. We have serious performance issue on saving
>>>>>>> permission and path updates.
>>>>>>> 
>>>>>>> How important is the performance issue? Do we expect high rate of
>>>>>>> permission updates in realistic scenarios? Do we have any idea about
>>>>>>> current supported rate of permission updates?
>>>>>>> 
>>>>>>> Note that in our single sentry server case the code that updates the
>>>>> DB
>>>>>> is
>>>>>>> serialized, and, thus, we shouldn’t see any collisions at the DB
>>>>> level at
>>>>>>> all. This means that there is some other bug hiding here, still
>>>>>>> undiscovered. The retries and exponential backoff shouldn’t apply in
>>>>> the
>>>>>>> single-server case at all.
>>>>>>> 
>>>>>>>> 
>>>>>>>> 2) Potential solutions
>>>>>>>> 
>>>>>>>> 2.1) Find out why we have collision on a single sentry server with
>>>>>>> synchronization on saving updates. Once find the cause, fix it.
>>>>>>> 
>>>>>>> +1 on that
>>>>>>> 
>>>>>>>> + Follow existing approach. Does not introduce big change to the
>>>>> code
>>>>>>> base.
>>>>>>>> - Need time to investigate why synchronization at application level
>>>>> on
>>>>>> a
>>>>>>> single sentry server does not prevent key collision.
>>>>>>>> - Does not scale. All updates are serialized, not much concurrency.
>>>>>>>> - Still have key collision exception and transaction failure when
>>>>> more
>>>>>>> than one Sentry servers are deployed.
>>>>>>>> - Transaction failure at collision increases time to execute a
>>>>>>> transaction.
>>>>>>>> - It is confusing to customer that there are transaction failure in
>>>>>>> normal operation. Increase support cases
>>>>>>>> 
>>>>>>>> 2.2) CDH-transactionFailure.01-cdh5-1.5.1.patch
>>>>>>>> The patch that achieves 5 times or more performance increase than
>>>>> the
>>>>>>> current approach.
>>>>>>>> It contains the following changes
>>>>>>>> revert sentry-1795 (so the changeID is auto-increment. This avoids
>>>>> key
>>>>>>> collision. This is main reason we have performance improvement)
>>>>>>>> revert sentry-1824 (no need to synchronize when changeID is
>>>>>>> auto-increment)
>>>>>>>> get continuous delta list from SentryStore even when the delta list
>>>>> has
>>>>>>> hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the hole
>>>>> is at
>>>>>>> the front of list, return full snapshot)
>>>>>>> 
>>>>>>> Please document the algorithm you are suggesting in more detail. How
>>>>> do
>>>>>>> you propose to handle synchronization between NN and Sentry server
>>>>> using
>>>>>>> auto-incremented IDs?
>>>>>>> 
>>>>>>> Performance is an important consideration once we achieve
>>>>> correctness. It
>>>>>>> would be great to have *correct* solution that also performs well.
>>>>>>>> + Relative small changes. Verified working with good performance
>>>>> when
>>>>>>> there is no transaction failure
>>>>>>>> + When the hole in delta list is temporary (transaction in flight),
>>>>>>> return the continuous delta list is effective to deal with the hole.
>>>>> Most
>>>>>>> likely, the hole will disappear next time HDFS requests for changes.
>>>>>>>> 
>>>>>>>> - When there is transaction failure (the hole in changeID is
>>>>>> permanent),
>>>>>>> sends back full snapshot, which is expensive. If we can detect
>>>>> permanent
>>>>>>> hole, then we don't need to send full snapshot, which is very
>>>>> expensive,
>>>>>>> and may exhaust memory for big customer
>>>>>>>> 
>>>>>>> 
>>>>>>> Right, sending full snapshot is extremely expensive. How do you
>>>>> propose
>>>>>> to
>>>>>>> detect such permanent hole?
>>>>>>>> 2.3) Use timestamp to sort the changes
>>>>>>>> 
>>>>>>>> a) use timestamp, such as MSentryPermChange.createTimeMs or
>>>>>>> MSentryPathChange.createTimeMs to sort the entries. If there are more
>>>>>> than
>>>>>>> one entry having same timestamp, use changeID to break the tie.
>>>>>>> 
>>>>>>> IMO we should use either timestamps or changeIDs, not both.
>>>>>>> 
>>>>>>>> b) HDFS asks for updates using these timestamp values instead of
>>>>>>> changeID.
>>>>>>> 
>>>>>>> I think that this approach may cause some updates to be lost.
>>>>>>> 
>>>>>>> 
>>>>>>>> c) changeID is primary key, only used to uniquely identify the
>>>>> entry,
>>>>>>> and not required to be sequential nor consecutive.
>>>>>>>> d) Purge the change entries in DB using timestamp instead of
>>>>> changeID.
>>>>>>> For example, keep 3 polling intervals of entries to allow HDFS
>> getting
>>>>>> the
>>>>>>> changes before they are purged.
>>>>>>> 
>>>>>>> Suppose that you do this purging. Later you restart NN and it asks
>> for
>>>>>> all
>>>>>>> updates since time T, but a bunch of updates since this time were
>>>>> purged,
>>>>>>> so they never reach NN and are permanently lost. And it is impossible
>>>>> to
>>>>>>> detect that this happened.
>>>>>>> 
>>>>>>>> + Sentry only sends full snapshot to HDFS at the first time when
>>>>> HDFS
>>>>>>> starts, and then always sends delta changes to HDFS
>>>>>>> 
>>>>>>> What if HDFS wasn’t able to talk to Sentry for some time and in the
>>>>> mean
>>>>>>> time some entries were purged?
>>>>>>> 
>>>>>>> 
>>>>>>>> + High concurrency. Scale well with large number of clients
>>>>>>>> - Relative big code change for API between sentry server and sentry
>>>>>>> plugin at HDFS.
>>>>>>>> - No easy way to detect that HDFS has received and processed all
>>>>>> updates
>>>>>>>> 
>>>>>>>> 3) Decisions
>>>>>>>> 
>>>>>>>> My suggestion is that we take approach 2.2) for short term and take
>>>>> the
>>>>>>> hit of full snapshot when there is transaction failure. And take
>>>>> approach
>>>>>>> 2.3) as long term solution.
>>>>>>> 
>>>>>>> We can’t make the decision since we do not have enough information
>>>>> about
>>>>>>> approach 2.2 - how does it actually work? This may be a really good
>>>>>>> solution if we can prove that it works.
>>>>>>> 
>>>>>>> - Sasha
>>>>>>> 
>>>>>>>> On Jul 22, 2017, at 12:42 AM, Na Li <lina...@cloudera.com> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Can you provide your feedback on this issue? We need to make
>>>>> decision
>>>>>>> soon.
>>>>>>>> 
>>>>>>>> 1) Summary
>>>>>>>> 
>>>>>>>> The current approach updates changeID (primary key for permission
>>>>>> change
>>>>>>>> and path change) manually. In a single transaction, the code reads
>>>>> the
>>>>>>> max
>>>>>>>> changeID from DB, increases it by one, and save the value in new
>>>>> change
>>>>>>>> entry. If two threads are adding changes into DB at the same time,
>>>>>>>> collision happens (primary key does not allow multiple entries
>>>>> having
>>>>>> the
>>>>>>>> same value), and one transaction fails. Then the failed transaction
>>>>>> goes
>>>>>>>> through multiple re-tries. If retry count reaches max value, the
>>>>>>>> transaction fails.
>>>>>>>> 
>>>>>>>> In our stress testing on a single sentry server, with 15 clients
>>>>> doing
>>>>>>>> grant/revoke operations concurrently, we saw multiple transaction
>>>>>>> failures,
>>>>>>>> and the exponential-back-off retry increases the latency of every
>>>>>>>> transaction in sentry. We have serious performance issue on saving
>>>>>>>> permission and path updates.
>>>>>>>> 
>>>>>>>> 2) Potential solutions
>>>>>>>> 
>>>>>>>> 2.1) Find out why we have collision on a single sentry server with
>>>>>>>> synchronization on saving updates. Once find the cause, fix it.
>>>>>>>> + Follow existing approach. Does not introduce big change to the
>>>>> code
>>>>>>> base.
>>>>>>>> - Need time to investigate why synchronization at application level
>>>>> on
>>>>>> a
>>>>>>>> single sentry server does not prevent key collision.
>>>>>>>> - Does not scale. All updates are serialized, not much concurrency.
>>>>>>>> - Still have key collision exception and transaction failure when
>>>>> more
>>>>>>> than
>>>>>>>> one Sentry servers are deployed.
>>>>>>>> - Transaction failure at collision increases time to execute a
>>>>>>> transaction.
>>>>>>>> - It is confusing to customer that there are transaction failure in
>>>>>>> normal
>>>>>>>> operation. Increase support cases
>>>>>>>> 
>>>>>>>> 2.2) Auto-increment changeID and send delta changes as much as
>>>>> possible
>>>>>>>> 
>>>>>>>> The patch that achieves 5 times or more performance increase than
>>>>> the
>>>>>>>> current approach.
>>>>>>>> 
>>>>>>>> It contains the following changes
>>>>>>>> 
>>>>>>>> - revert sentry-1795 (so the changeID is auto-increment. This
>>>>> avoids
>>>>>>> key
>>>>>>>> collision. This is main reason we have performance improvement)
>>>>>>>> - revert sentry-1824 (no need to synchronize when changeID is
>>>>>>>> auto-increment)
>>>>>>>> - get continuous delta list from SentryStore even when the delta
>>>>> list
>>>>>>>> has hole (for example, the list is 1,2,3,5,6, return 1,2,3. If the
>>>>>>> hole is
>>>>>>>> at the front of list, return full snapshot)
>>>>>>>> 
>>>>>>>> + Relative small changes. Verified working with good performance
>>>>> when
>>>>>>> there
>>>>>>>> is no transaction failure
>>>>>>>> 
>>>>>>>> + When the hole in delta list is temporary (transaction in flight),
>>>>>>> return
>>>>>>>> the continuous delta list is effective to deal with the hole. Most
>>>>>>> likely,
>>>>>>>> the hole will disappear next time HDFS requests for changes.
>>>>>>>> 
>>>>>>>> - When there is transaction failure (the hole in changeID is
>>>>>> permanent),
>>>>>>>> sends back full snapshot, which is expensive. If we can detect
>>>>>> permanent
>>>>>>>> hole, then we don't need to send full snapshot, which is very
>>>>>> expensive,
>>>>>>>> and may exhaust memory for big customer
>>>>>>>> 
>>>>>>>> 2.3) Use timestamp to sort the changes
>>>>>>>> 
>>>>>>>> a) use timestamp, such as MSentryPermChange.createTimeMs or
>>>>>>>> MSentryPathChange.createTimeMs to sort the entries. If there are
>>>>> more
>>>>>>> than
>>>>>>>> one entry having same timestamp, use changeID to break the tie.
>>>>>>>> b) HDFS asks for updates using these timestamp values instead of
>>>>>>> changeID.
>>>>>>>> Sentry server sends back changes at and after this timestamp. HDFS
>>>>>> keeps
>>>>>>>> the list of changeIDs associated with the requesting timestamp and
>>>>> skip
>>>>>>>> entries already processed. This is to handle the situation when more
>>>>>> than
>>>>>>>> one entry having same timestamp, and some are sent in previous
>>>>> request,
>>>>>>> and
>>>>>>>> some need to be send in next request.
>>>>>>>> c) changeID is primary key, only used to uniquely identify the
>>>>> entry,
>>>>>> and
>>>>>>>> not required to be sequential nor consecutive.
>>>>>>>> d) Purge the change entries in DB using timestamp instead of
>>>>> changeID.
>>>>>>> For
>>>>>>>> example, keep 3 polling intervals of entries to allow HDFS getting
>>>>> the
>>>>>>>> changes before they are purged.
>>>>>>>> + Sentry only sends full snapshot to HDFS at the first time when
>>>>> HDFS
>>>>>>>> starts, and then always sends delta changes to HDFS
>>>>>>>> + High concurrency. Scale well with large number of clients
>>>>>>>> - Relative big code change for API between sentry server and sentry
>>>>>>> plugin
>>>>>>>> at HDFS.
>>>>>>>> - No easy way to detect that HDFS has received and processed all
>>>>>> updates
>>>>>>>> 
>>>>>>>> 3) Decisions
>>>>>>>> 
>>>>>>>> My suggestion is that we take approach 2.2) for short term and take
>>>>> the
>>>>>>> hit
>>>>>>>> of full snapshot when there is transaction failure. And take
>>>>> approach
>>>>>>> 2.3)
>>>>>>>> as long term solution.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Lina
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: SENTRY-1855: PERM/PATH transactions can fail to commit to the sentry database under load

Reply via email to