bkietz commented on PR #37418: URL: https://github.com/apache/arrow/pull/37418#issuecomment-1737718644
> This doesn't need to be a compute function. I'm sorry I hadn't proposed this possibility already; it's certainly much less complex if it's tenable. Since this issue was opened as a prerequisite of https://github.com/apache/arrow/pull/37100 I was unnecessarily stuck on on handling this problem in the context of compute kernels. I think providing `DictionaryArray::Compact` will be acceptable. To flesh out the proposal a little more: ```c++ /// Return an equivalent dictionary array with no unused dictionary entries. /// /// For example, consider a dictionary array where only two values of the dictionary /// are referenced: /// dict_array = {indices=[3, 5, 3, 5], dictionary=["a", "b", "c", "d", "e", "f"]} /// assert dict_array.Compact() == {indices=[0, 1, 0, 1], dictionary=["d", "f"]} Result<std::shared_ptr<DictionaryArray>> DictionaryArray::Compact(MemoryPool*) const; /// Return transpose mapping from uncompacted indices to compacted indices and a /// /// For example, consider a dictionary array where only two values of the dictionary /// are referenced: /// dict_array = {indices=[3, 5, 3, 5], dictionary=["a", "b", "c", "d", "e", "f"]} /// map = dict_array.CompactTransposeIndices() /// assert map == [-1, -1, -1, 0, -1, 1] /// assert dict_array.Transpose(map) == dict_array.Compact() Result<std::shared_ptr<Buffer>> DictionaryArray::CompactTransposeIndices(MemoryPool*) const; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
