kou commented on issue #40431:
URL: https://github.com/apache/arrow/issues/40431#issuecomment-1989663465

   Sorry. I missed this.
   
   Thanks. I could run the code:
   
   ```diff
   diff --git a/cpp/src/arrow/compute/key_hash_test.cc 
b/cpp/src/arrow/compute/key_hash_test.cc
   index c998df7169..ccfddaa645 100644
   --- a/cpp/src/arrow/compute/key_hash_test.cc
   +++ b/cpp/src/arrow/compute/key_hash_test.cc
   @@ -311,5 +311,24 @@ TEST(VectorHash, FixedLengthTailByteSafety) {
      HashFixedLengthFrom(/*key_length=*/19, /*num_rows=*/64, /*start_row=*/63);
    }
    
   +TEST(HashBatch, BasicTest) {
   +  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
   +  const int batch_len = arr->length();
   +  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
   +  auto ctx = arrow::compute::default_exec_context();
   +  arrow::util::TempVectorStack stack;
   +  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)));
   +
   +  std::vector<uint32_t> hashes(batch_len);
   +  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
   +  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
   +      exec_batch, hashes.data(), temp_column_arrays,
   +      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));
   +
   +  for (int i = 0; i < batch_len; i++) {
   +    std::cout << hashes[i] << " ";
   +  }
   +}
   +
    }  // namespace compute
    }  // namespace arrow
   ```
   
   In general, allocating only required size is preferred. So using the 
`num_rows` rather than `util::MiniBatch::kMiniBatchLength` in `HashBatch` 
related apis may be better. (Sorry, I can't determine whether the `num_rows` is 
the correct size for it right now.)
   
   But it seems that `stack.Init(ctx->memory_pool(), batch_len * 
sizeof(uint32_t))` isn't enough for this case. Because we need at least 3 
allocations for `HashMultiColumn()`:
   
   
https://github.com/apache/arrow/blob/605f8a792c388afb2230b1f19e0f3e4df90d5abe/cpp/src/arrow/compute/key_hash.cc#L387-L395
   
   And we also need 16 bytes metadata:
   
   
https://github.com/apache/arrow/blob/605f8a792c388afb2230b1f19e0f3e4df90d5abe/cpp/src/arrow/compute/util.cc#L35-L44
   
   Anyway, could you try this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to