tobim opened a new pull request, #48128: URL: https://github.com/apache/arrow/pull/48128
### Rationale for this change Before this change it was possible for two threads calling `field()` with the same index at the same time to cause a race on the stored entry in `boxed_fields_`. I.e. if a second thread goes into the path that calls `MakeArray` before the first thread stored its own new array, the second thread would also write to the same `shared_ptr` and invalidate the `shared_ptr` from the first thread, thereby also invalidating the returned reference. ### What changes are included in this PR? This PR changes the return type of StructArray::field() from `shared_ptr<Array>&` to `shared_ptr<Array>` giving the caller co-ownership of the data and safeguarding against any potential concurrent writes to the underlying `boxed_fields_` vector. It also changes the body to use the CAS pattern to avoid multiple concurrent writes to the same address. ### Are these changes tested? I don't know how to write a deterministic test that triggers the issue before the fix. Even a non-deterministic test needs to run with address sanitizer or valgrind or something similar. I can however confirm that this change fixes an issue that I've been debugging in https://github.com/tenzir/tenzir. ### Are there any user-facing changes? While changing `StructArray::field()` to return by value is an API change, I believe this should be compatible with regular uses of that function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
