tobim opened a new pull request, #48128:
URL: https://github.com/apache/arrow/pull/48128

   ### Rationale for this change
   
   Before this change it was possible for two threads calling `field()` with 
the same index at the same time to cause a race on the stored entry in 
`boxed_fields_`. I.e. if a second thread goes into the path that calls 
`MakeArray` before the first thread stored its own new array, the second thread 
would also write to the same `shared_ptr` and invalidate the `shared_ptr` from 
the first thread, thereby also invalidating the returned reference.
   
   ### What changes are included in this PR?
   
   This PR changes the return type of StructArray::field() from 
`shared_ptr<Array>&` to `shared_ptr<Array>` giving the caller co-ownership of 
the data and safeguarding against any potential concurrent writes to the 
underlying `boxed_fields_` vector.
   It also changes the body to use the CAS pattern to avoid multiple concurrent 
writes to the same address.
   
   ### Are these changes tested?
   
   I don't know how to write a deterministic test that triggers the issue 
before the fix. Even a non-deterministic test needs to run with address 
sanitizer or valgrind or something similar.
   
   I can however confirm that this change fixes an issue that I've been 
debugging in https://github.com/tenzir/tenzir.
   
   ### Are there any user-facing changes?
   
   While changing `StructArray::field()` to return by value is an API change, I 
believe this should be compatible with regular uses of that function.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to