Athanassios Hatzis created ARROW-9505:
-----------------------------------------

             Summary: [Python] pa.struct() dictionary-encode not implemented 
for decimal
                 Key: ARROW-9505
                 URL: https://issues.apache.org/jira/browse/ARROW-9505
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 0.17.1
            Reporter: Athanassios Hatzis


Hi,  in this PyArrow structured array

 
{code:java}
struct_array.slice(0,3)
Out[52]: 
<pyarrow.lib.StructArray object at 0x7f92061e9dc0>
-- is_valid: all not null
-- child 0 type: int16
 [
 991,
 992,
 993
 ]
-- child 1 type: decimal(6, 3)
 [
 36.100,
 42.300,
 15.300
 ]
{code}
I have tried to apply dictionary_encode() method and I got back this error

 
{code:java}
struct_array.dictionary_encode()
File "<ipython-input-51-440741990dd7>", line 1, in <module>
 struct_array.dictionary_encode()
 File "pyarrow/array.pxi", line 750, in pyarrow.lib.Array.dictionary_encode
 File "pyarrow/error.pxi", line 106, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: dictionary-encode not implemented for 
struct<catpid: int16, catcost: decimal(6, 3)> 
{code}
I know that it is possible to apply dictionary_encode() to each field of the 
struct_array and you can create a RecordBatch from the dictionary encoded 
fields of the array. So I am not sure why this functionality is not implemented.

I also noticed that there is a transformation RecordBatch.from_struct_array() 
but I want the columns to be dictionary encoded and the only way to do this in 
the current version is to process each field, column separately.

BTW: In my project I am addressing a basic problem which is how to transform 
tuples from any database table to dictionary encoded columns of a PyArrow 
RecordBatch (Table). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to