zanmato1984 commented on code in PR #45789:
URL: https://github.com/apache/arrow/pull/45789#discussion_r2003633794
##########
cpp/src/arrow/acero/aggregate_internal.cc:
##########
@@ -177,14 +177,11 @@ void AggregatesToString(std::stringstream* ss, const
Schema& input_schema,
*ss << ']';
}
-Status ExtractSegmenterValues(std::vector<Datum>* values_ptr,
- const ExecBatch& input_batch,
+Status ExtractSegmenterValues(std::vector<Datum>& values, const ExecBatch&
input_batch,
const std::vector<int>& field_ids) {
+ DCHECK_EQ(values.size(), field_ids.size());
DCHECK_GT(input_batch.length, 0);
- std::vector<Datum>& values = *values_ptr;
int64_t row = input_batch.length - 1;
- values.clear();
- values.resize(field_ids.size());
for (size_t i = 0; i < field_ids.size(); i++) {
Review Comment:
The function CAN be executed multi-threaded, as long as the aggregation is
non-segmented (i.e., no segment keys, `field_ids.size() == values.size() ==
0`). And if so, note the non-synchronized call to `values.clear()` and
`values.resize()` - this is what this PR is actually fixing.
> Sorry for the questions...
No worry please. I'm glad to answer :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]