nickva opened a new issue #3560:
URL: https://github.com/apache/couchdb/issues/3560


   In the presence of retryable errors emitted by FDB it's possible for new 
document inserts to return a conflict (409) instead of a 201 response. This 
happens when `1021` error code is emitted along with other retryable codes.
   
   `1021` is `commit_unknown_result` and essentially means we don't know if the 
transaction committed or not. In general, we have a way to [mitigate 
this](https://github.com/apache/couchdb/blob/main/src/fabric/src/fabric2_fdb.erl#L186-L191)
 by writing a unique transaction ID to subspace and then, before each 
transaction starts, we check the previous error code. If the transaction ID is 
present we know the previous transaction has committed and we return the saved 
result from the previous transaction (so for doc updates we'd return a 201 for 
example).
   
   However, that doesn't work if another retryable error code is emitted in the 
same request. 
   
   A few examples by logging on the `on_error` error code:
   
   ```
   [error] 2021-05-11T05:01:39.433449Z [email protected] <0.517.0> b6a156aa76  
++++++++++++++ ERROR CODE 1042 +++++++++++
   [error] 2021-05-11T05:01:39.435380Z [email protected] <0.517.0> b6a156aa76  
++++++++++++++ ERROR CODE 1009 +++++++++++
   [error] 2021-05-11T05:01:39.455958Z [email protected] <0.517.0> b6a156aa76  
++++++++++++++ ERROR CODE 1021 +++++++++++
   [error] 2021-05-11T05:01:39.472909Z [email protected] <0.517.0> b6a156aa76  
++++++++++++++ ERROR CODE 1009 +++++++++++
   [notice] 2021-05-11T05:01:39.489403Z [email protected] <0.517.0> b6a156aa76 
127.0.0.1:15984 127.0.0.1 adm POST 
/random-test-db--576460743007172517--576460752303423487 409 ok 63
   ```
   
   Another example
   
   ```
   [error] 2021-05-11T05:11:50.210820Z [email protected] <0.498.0> 77d4be8e88  
++++++++++++++ ERROR CODE 1021 +++++++++++
   [error] 2021-05-11T05:11:50.218575Z [email protected] <0.498.0> 77d4be8e88  
++++++++++++++ ERROR CODE 1009 +++++++++++
   [error] 2021-05-11T05:11:50.244467Z [email protected] <0.498.0> 77d4be8e88  
++++++++++++++ ERROR CODE 1021 +++++++++++
   [notice] 2021-05-11T05:11:50.266724Z [email protected] <0.498.0> 77d4be8e88 
127.0.0.1:15984 127.0.0.1 adm POST 
/random-test-db--576460743366880824--576460752303423488 409 ok 7
   ```
   
   The glossary of error codes involved:
   
   ```
   commit_unknown_result 1021
   future_version 1009
   proxy_memory_limit_exceeded 1042
   ```
   
   I think we probably want to have a longer "memory" of the previous commit 
unknown result state for the request, and then keep re-using the success result 
on subsequent retries. 
   
   This was discovered while running the test suite with the client buggify 
options turned on:
   
   ```
   make buggify-elixir-suite
   ```
   
   That may have to be done multiple times until it picks 1021 as the activated 
error it will periodically throw.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to