rtpsw commented on PR #36499:
URL: https://github.com/apache/arrow/pull/36499#issuecomment-1624267164

   > Hmm, this could probably have been fixed more idiomatically by storing a 
std::weak_ptr<RecordBatch> in KeyHasher. And you wouldn't have had to worry 
about concurrency.
   
   Unfortunately, it's not that simple. The concurrency problem is not caused 
by an invalid/dangling pointer to a record batch that a weak pointer would have 
avoided. I'll try to answer both your and Westons's questions.
   
   Some background first. The key hasher is used by the processing thread and 
caches hashes corresponding to a particular record batch. Therefore, the key 
hasher (i.e., its cache of hashes) should be invalidated when the record batch 
being handled by the processing thread changes. The complexity here is that 
this change is affected by the input-receiving thread. The post-PR code 
invalidates the key hasher just before this point using a thread-synchronizing 
operation. This ensures that the cached hashes at this point would not be used 
by the processing thread calling `HashesFor`; instead, `HashesFor` will compute 
the hashes for the new record batch the next time `HashesFor` would be invoked.
   
   Given this background, we can see that the problem is not about an 
invalid/pointer to a record batch but about invalid cached hashes. In the 
pre-PR code, the failure could happen with the following order of operations:
   
   1. The input-receiving thread pushes a record batch to an empty queue, 
placing it at the front, thus making it the current record batch being 
processed.
   2. The processing thread invokes `AdvanceAndMemoize`, which invokes 
`GetLatestKey`, which invokes `GetKey`, which invokes `HashesFor`, which finds 
hashes that are incorrect for the above batch. This causes the problem, which 
is a concurrency one because it is driven by a non-deterministic order of 
operations.
   
   Only thereafter, the processing thread, still in `AdvanceAndMemoize`, 
invokes `Advance`, which deals with the queue and could detect a new record 
batch in order to then invalidate the key hasher, but it is too late due to 
item 2 above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to