[
https://issues.apache.org/jira/browse/ARROW-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537728#comment-17537728
]
Ariana Villegas edited comment on ARROW-14314 at 5/16/22 6:52 PM:
------------------------------------------------------------------
Ok, I got it.
[~apitrou] In that case, I think we can do something like this:
* Given the following dictionary:
{code:java}
values: ['c', 'a', 'b', 'b']
indices: [0, 1, 3, 2, 3, 0]
{code}
* Get sort_idx from values and transform it to give the same idx to same values
{code:java}
values_sort_idx = [1, 2, 3, 0]
transformed_sort_idx = [3, 0, 1, 1]{code}
* Get sort_idx from transformed indices
{code:java}
transformed_indices = [3, 0, 1, 1, 1, 3]
sort_indices = [1, 2, 3, 4, 0, 5]{code}
With nulls, it will work similarly:
* Given the following dictionary:
{code:java}
values: ['a', null, 'b', 'c']
indices: [0, 1, null, 0, 2, 3]
{code}
* Get sort_idx from values and transform it to give the same idx to same values
{code:java}
values_sort_idx = [0, 2, 3, 1]
transformed_sort_idx = [0, 3, 1, 2]
{code}
* Get sort_idx from transformed indices
{code:java}
transformed_indices = [0, 3, null, 0, 1, 2]
sort_indices = [0, 3, 4, 5, 1, 2]{code}
was (Author: JIRAUSER280694):
Ok, I got it.
In that case, I think we can do something like this:
* Given the following dictionary:
{code:java}
values: ['c', 'a', 'b', 'b']
indices: [0, 1, 3, 2, 3, 0]
{code}
* Get sort_idx from values and transform it to give the same idx to same values
{code:java}
values_sort_idx = [1, 2, 3, 0]
transformed_sort_idx = [3, 0, 1, 1]{code}
* Get sort_idx from transformed indices
{code:java}
transformed_indices = [3, 0, 1, 1, 1, 3]
sort_indices = [1, 2, 3, 4, 0, 5]{code}
With nulls, it will work similarly:
* Given the following dictionary:
{code:java}
values: ['a', null, 'b', 'c']
indices: [0, 1, null, 0, 2, 3]
{code}
* Get sort_idx from values and transform it to give the same idx to same values
{code:java}
values_sort_idx = [0, 2, 3, 1]
transformed_sort_idx = [0, 3, 1, 2]
{code}
* Get sort_idx from transformed indices
{code:java}
transformed_indices = [0, 3, null, 0, 1, 2]
sort_indices = [0, 3, 4, 5, 1, 2]{code}
> [C++] Sorting dictionary array not implemented
> ----------------------------------------------
>
> Key: ARROW-14314
> URL: https://issues.apache.org/jira/browse/ARROW-14314
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Neal Richardson
> Priority: Major
> Labels: kernel
> Fix For: 9.0.0
>
>
> From R, taking the stock {{mtcars}} dataset and giving it a dictionary type
> column:
> {code}
> mtcars %>%
> mutate(cyl = as.factor(cyl)) %>%
> Table$create() %>%
> arrange(cyl) %>%
> collect()
> Error: Type error: Sorting not supported for type dictionary<values=string,
> indices=int8, ordered=0>
> ../src/arrow/compute/kernels/vector_array_sort.cc:427 VisitTypeInline(type,
> this)
> ../src/arrow/compute/kernels/vector_sort.cc:148
> GetArraySorter(*physical_type_)
> ../src/arrow/compute/kernels/vector_sort.cc:1206 sorter.Sort()
> ../src/arrow/compute/api_vector.cc:259 CallFunction("sort_indices", {datum},
> &options, ctx)
> ../src/arrow/compute/exec/order_by_impl.cc:53 SortIndices(table, options_,
> ctx_)
> ../src/arrow/compute/exec/sink_node.cc:292 impl_->DoFinish()
> ../src/arrow/compute/exec/exec_plan.cc:297 iterator_.Next()
> ../src/arrow/record_batch.cc:318 ReadNext(&batch)
> ../src/arrow/record_batch.cc:329 ReadAll(&batches)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)