rtpsw commented on PR #36499:
URL: https://github.com/apache/arrow/pull/36499#issuecomment-1630293586

   > @rtpsw https://github.com/apache/arrow/pull/36580 is not exactly what I 
was asking for.
   
   Right, this PR is [the alternative I 
described](https://github.com/apache/arrow/pull/36499#issuecomment-1627706130). 
We could also try the alternative you described, where the hashes are paired 
with the batch, and check its performance, [as 
discussed](https://github.com/apache/arrow/pull/36499#issuecomment-1625532994).
   
   > There are still two thread reading/writing the batch_index variable which 
can lead to race conditions / complex threading model.
   
   There is no batch index variable that is shared between threads. There is a 
member variable `InputState::batch_index_` that is accessed exclusively by the 
input-receiving thread and used to number the incoming record batches, and 
there is a separate value `NumberedRecordBatch::index` that (along with the 
record batch it numbers) is transferred via the queue to the processing thread, 
where it is used and set to a local variable `batch_index`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to