nickva opened a new issue #3560: URL: https://github.com/apache/couchdb/issues/3560
In the presence of retryable errors emitted by FDB it's possible for new document inserts to return a conflict (409) instead of a 201 response. This happens when `1021` error code is emitted along with other retryable codes. `1021` is `commit_unknown_result` and essentially means we don't know if the transaction committed or not. In general, we have a way to [mitigate this](https://github.com/apache/couchdb/blob/main/src/fabric/src/fabric2_fdb.erl#L186-L191) by writing a unique transaction ID to subspace and then, before each transaction starts, we check the previous error code. If the transaction ID is present we know the previous transaction has committed and we return the saved result from the previous transaction (so for doc updates we'd return a 201 for example). However, that doesn't work if another retryable error code is emitted in the same request. A few examples by logging on the `on_error` error code: ``` [error] 2021-05-11T05:01:39.433449Z [email protected] <0.517.0> b6a156aa76 ++++++++++++++ ERROR CODE 1042 +++++++++++ [error] 2021-05-11T05:01:39.435380Z [email protected] <0.517.0> b6a156aa76 ++++++++++++++ ERROR CODE 1009 +++++++++++ [error] 2021-05-11T05:01:39.455958Z [email protected] <0.517.0> b6a156aa76 ++++++++++++++ ERROR CODE 1021 +++++++++++ [error] 2021-05-11T05:01:39.472909Z [email protected] <0.517.0> b6a156aa76 ++++++++++++++ ERROR CODE 1009 +++++++++++ [notice] 2021-05-11T05:01:39.489403Z [email protected] <0.517.0> b6a156aa76 127.0.0.1:15984 127.0.0.1 adm POST /random-test-db--576460743007172517--576460752303423487 409 ok 63 ``` Another example ``` [error] 2021-05-11T05:11:50.210820Z [email protected] <0.498.0> 77d4be8e88 ++++++++++++++ ERROR CODE 1021 +++++++++++ [error] 2021-05-11T05:11:50.218575Z [email protected] <0.498.0> 77d4be8e88 ++++++++++++++ ERROR CODE 1009 +++++++++++ [error] 2021-05-11T05:11:50.244467Z [email protected] <0.498.0> 77d4be8e88 ++++++++++++++ ERROR CODE 1021 +++++++++++ [notice] 2021-05-11T05:11:50.266724Z [email protected] <0.498.0> 77d4be8e88 127.0.0.1:15984 127.0.0.1 adm POST /random-test-db--576460743366880824--576460752303423488 409 ok 7 ``` The glossary of error codes involved: ``` commit_unknown_result 1021 future_version 1009 proxy_memory_limit_exceeded 1042 ``` I think we probably want to have a longer "memory" of the previous commit unknown result state for the request, and then keep re-using the success result on subsequent retries. This was discovered while running the test suite with the client buggify options turned on: ``` make buggify-elixir-suite ``` That may have to be done multiple times until it picks 1021 as the activated error it will periodically throw. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
