zanmato1984 commented on code in PR #45789:
URL: https://github.com/apache/arrow/pull/45789#discussion_r2003564855
##########
cpp/src/arrow/acero/aggregate_internal.cc:
##########
@@ -177,14 +177,11 @@ void AggregatesToString(std::stringstream* ss, const
Schema& input_schema,
*ss << ']';
}
-Status ExtractSegmenterValues(std::vector<Datum>* values_ptr,
- const ExecBatch& input_batch,
+Status ExtractSegmenterValues(std::vector<Datum>& values, const ExecBatch&
input_batch,
const std::vector<int>& field_ids) {
+ DCHECK_EQ(values.size(), field_ids.size());
DCHECK_GT(input_batch.length, 0);
- std::vector<Datum>& values = *values_ptr;
int64_t row = input_batch.length - 1;
- values.clear();
- values.resize(field_ids.size());
for (size_t i = 0; i < field_ids.size(); i++) {
Review Comment:
The thread-safety here is also guaranteed by the assumptions described in
https://github.com/apache/arrow/pull/45789#discussion_r2003072183 : only
non-segmented case allows multi-threading, and then `field_ids` and `values`
must be empty so no assignments.
Note:
1. The race being fixed is `values.clear()` and `values.resize()` which
always get executed even for non-segmented case (multi-threaded). The
assignments never cause real race. My updated fix only works around these
racing calls by taking the advantage of that `values` can be pre-sized (kind of
tricky yeah).
2. The assignments not causing race doesn't make `segmenter_values_`
design-wise right. As also mentioned in
https://github.com/apache/arrow/pull/45789#discussion_r2003072183 , it should
be considered an abstraction leak and should be addressed - just not in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]