michalursa commented on code in PR #13493:
URL: https://github.com/apache/arrow/pull/13493#discussion_r918463232
##########
cpp/src/arrow/compute/exec/partition_util.h:
##########
@@ -118,6 +118,43 @@ class PartitionLocks {
/// \brief Release a partition so that other threads can work on it
void ReleasePartitionLock(int prtn_id);
+ template <typename IS_PRTN_EMPTY_FN, typename PROCESS_PRTN_FN>
+ Status ForEachPartition(size_t thread_id, int* temp_unprocessed_prtns,
Review Comment:
There is no expected threading here. PartitionLocks implement an array of
locks for an array of shared objects. In order to update any of the shared
objects a lock must be taken.
ForEachPartition call is used when caller has to update many (potentially
all) partitions. Then instead of acquiring locks in a fixed order and calling
the appropriate partition update function in the same order, we randomly pick
one of unprocessed partitions, try to acquire lock without blocking, if
successful then call partition processing function (PROCESS_PRTN_FN) for that
partition, if not successful try a different random partition, continue until
all partitions are done.
So this is only about a single thread needing to update all partitions of a
shared data structure, but concurrency can be improved due to the fact that the
thread does not care about the order of processing partitions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]