llama90 opened a new issue, #38386:
URL: https://github.com/apache/arrow/issues/38386

   ### Describe the enhancement requested
   
   Hello. While implementing join operation support for the Dictionary type, I 
encountered the following message.
   
   I am attempting to support the Dictionary type through the following steps:
   
   1. Random generation of Dictionary data
   2. Supporting Dictionary type for `non-key` columns
   3. Supporting Dictionary type for `key` columns
   
   I discovered the following error while taking step 2.
   
   `'_error_or_value55.status()' failed with Type error: Unsupported type for 
RecordBatch sorting: dictionary<values=string, indices=int32, ordered=0>`
   
   The detailed content is as follows.
   
   ```text
   ...
   
   Test 2: LEFT_SEMI EQ_EQ parallel = true bloom_filter = true
         left schema: large_string / string / dictionary<values=string, 
indices=int32, ordered=0> / uint64 / null
         right schema: large_string / string / fixed_size_binary[17] / null
   /Users/lama/workspace/arrow-2/cpp/src/arrow/acero/test_util_internal.cc:476: 
Failure
   Failed
   '_error_or_value55.status()' failed with Type error: Unsupported type for 
RecordBatch sorting: dictionary<values=string, indices=int32, ordered=0>
   
/Users/lama/workspace/arrow-2/cpp/src/arrow/compute/kernels/vector_sort.cc:410  
VisitTypeInline(*physical_type, this)
   
/Users/lama/workspace/arrow-2/cpp/src/arrow/compute/kernels/vector_sort.cc:391  
factory.MakeColumnSort()
   
/Users/lama/workspace/arrow-2/cpp/src/arrow/compute/kernels/vector_sort.cc:665  
sorter.Sort(begin_offset)
   
/Users/lama/workspace/arrow-2/cpp/src/arrow/compute/kernels/vector_sort.cc:1038 
 sorter.Sort()
   /Users/lama/workspace/arrow-2/cpp/src/arrow/compute/api_vector.cc:316  
CallFunction("sort_indices", {datum}, &options, ctx)
   /Users/lama/workspace/arrow-2/cpp/src/arrow/acero/test_util_internal.cc:465  
SortIndices(tab, SortOptions(sort_keys))
   Google Test trace:
   
/Users/lama/workspace/arrow-2/cpp/src/arrow/acero/hash_join_node_test.cc:1168: 
LEFT_SEMI EQ_EQ parallel = true bloom_filter = true
   
   ...
   ```
   
   It appears that this error occurs due to the absence of sorting operation 
implementation for the Dictionary type, which is observed in the process of 
verifying the result values after performing the join operation.
   
   Additionally, I attempted to support key column operations for the Null 
type, but encountered a similar type of error in this case as well.
   
   `'_error_or_value45.status()' failed with NotImplemented: Function 'equal' 
has no kernel matching input types (null, null)`
   
   Following these two error messages led me to the files below:
   
   * cpp/src/arrow/compute/kernels/codegen_internal.cc
   * cpp/src/arrow/compute/kernels/codegen_internal.h
   
   Should I reference the logic in these files to implement the following 
functionalities, and then proceed with the join operation?
   
   * Implementing sorting for `dictionary` type.
   * Implementing equal operations for null, `null` types
   
   I am aiming to support sorting for the Dictionary Type to address the 
feature that triggers the error.
   
   It would be great for some advice if I am misunderstanding the problem, or 
if anyone is well-informed about this part..
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to