[ 
https://issues.apache.org/jira/browse/ARROW-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329344#comment-17329344
 ] 

Eduardo Ponce commented on ARROW-11673:
---------------------------------------

I would be glad to help with this issue.

A question that naturally follows is: What is the expected behavior when 
casting from a larger to a smaller type and the index overflows?

Possible solution: I think that triggering an error stating that the current 
data does not allows such cast to occur.

If dictionary types keep track of its largest index value, there is no need to 
iterate through the dataset when casting.

> [C++] Casting dictionary type to use different index type
> ---------------------------------------------------------
>
>                 Key: ARROW-11673
>                 URL: https://issues.apache.org/jira/browse/ARROW-11673
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> It's currently not implemented to cast from one dictionary type to another 
> dictionary type to change the index type. 
> Small example:
> {code}
> In [2]: arr = pa.array(['a', 'b', 'a']).dictionary_encode()
> In [3]: arr.type
> Out[3]: DictionaryType(dictionary<values=string, indices=int32, ordered=0>)
> In [5]: arr.cast(pa.dictionary(pa.int8(), pa.string()))
> ...
> ArrowNotImplementedError: Unsupported cast from dictionary<values=string, 
> indices=int32, ordered=0> to dictionary<values=string, indices=int8, 
> ordered=0> (no available cast function for target type)
> ../src/arrow/compute/cast.cc:112  
> GetCastFunctionInternal(cast_options->to_type, args[0].type().get())
> {code}
> From 
> https://stackoverflow.com/questions/66223730/how-to-change-column-datatype-with-pyarrow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to