bkietz commented on PR #37418:
URL: https://github.com/apache/arrow/pull/37418#issuecomment-1737718644

   > This doesn't need to be a compute function.
   
   I'm sorry I hadn't proposed this possibility already; it's certainly much 
less complex if it's tenable. Since this issue was opened as a prerequisite of 
https://github.com/apache/arrow/pull/37100 I was unnecessarily stuck on on 
handling this problem in the context of compute kernels.
   
   I think providing `DictionaryArray::Compact` will be acceptable. To flesh 
out the proposal a little more:
   
   ```c++
     /// Return an equivalent dictionary array with no unused dictionary 
entries.
     ///
     /// For example, consider a dictionary array where only two values of the 
dictionary
     /// are referenced:
     ///    dict_array = {indices=[3, 5, 3, 5], dictionary=["a", "b", "c", "d", 
"e", "f"]}
     ///    assert dict_array.Compact() == {indices=[0, 1, 0, 1], 
dictionary=["d", "f"]}
     Result<std::shared_ptr<DictionaryArray>> 
DictionaryArray::Compact(MemoryPool*) const;
   
     /// Return transpose mapping from uncompacted indices to compacted indices 
and a
     ///
     /// For example, consider a dictionary array where only two values of the 
dictionary
     /// are referenced:
     ///    dict_array = {indices=[3, 5, 3, 5], dictionary=["a", "b", "c", "d", 
"e", "f"]}
     ///    map = dict_array.CompactTransposeIndices()
     ///    assert map == [-1, -1, -1, 0, -1, 1]
     ///    assert dict_array.Transpose(map) == dict_array.Compact()
     Result<std::shared_ptr<Buffer>> 
DictionaryArray::CompactTransposeIndices(MemoryPool*) const;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to