ZhangHuiGui opened a new issue, #40431: URL: https://github.com/apache/arrow/issues/40431
### Describe the bug, including details regarding any error messages, version, and platform. The issue is similar to https://github.com/apache/arrow/pull/40007, but they are different. I want to use the `Hashing32::HashBatch` api for produce a hash-array for a batch. Although the `Hashing32` and `Hashing64` are used in join based codes, but they can be used independently. Like below codes: ```c auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]"); const int batch_len = arr->length(); arrow::compute::ExecBatch exec_batch({arr}, batch_len); auto ctx = arrow::compute::default_exec_context(); arrow::util::TempVectorStack stack; ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t))); // I just alloc the stack size as i needed. std::vector<uint32_t> hashes(batch_len); std::vector<arrow::compute::KeyColumnArray> temp_column_arrays; ASSERT_OK(arrow::compute::Hashing32::HashBatch( exec_batch, hashes.data(), temp_column_arrays, ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len)); ``` The crash stack in `HashBatch` is: ```shell arrow::compute::Hashing32::HashBatch arrow::compute::Hashing32::HashMultiColumn arrow::util::TempVectorHolder<unsigned int>::TempVectorHolder arrow::util::TempVectorStack::alloc ARROW_DCHECK(top_ <= buffer_size_); // top_=4176, buffer_size_=160 ``` The reason is blow codes: https://github.com/apache/arrow/blob/7e286dd004a8fcf2de0f58615793338076741208/cpp/src/arrow/compute/key_hash.cc#L385-L387 The holder use the `max_batch_size` which is `1024` as it's num_elements, it's far more than the temp stack's init `buffer_size`. I know that the `HashBatch` is only used in hash-join or related codes. For join, they have already done line clipping at the upper level, ensuring that each input batch size is less_equal to `kMiniBatchLength` and the stack size is bigger enough. But it can be used independently. So maybe we could use the `num_rows` rather than `util::MiniBatch::kMiniBatchLength` in `HashBatch` related apis? ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
