[ 
https://issues.apache.org/jira/browse/ARROW-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927530#comment-16927530
 ] 

Joris Van den Bossche commented on ARROW-6520:
----------------------------------------------

So the reason it was passing on master, is because it was not actually testing 
it. The signature of {{pa.table}} changed, and now the first argument is names 
and not schema, and when passing a schema as such it was silently ignored. Will 
open a separate JIRA for this.

Then, when trying to do this properly, it now fails on master (which is better 
down creating an invalid array as before). It boils down to this:

{code}
In [31]: arr = pa.array([b"1234" for _ in range(10)])  

In [32]: pa.array(arr, type=pa.binary(4)) 
---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-32-4651f2f37039> in <module>
----> 1 pa.array(arr, type=pa.binary(4))

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: Expected a string or bytes object, got a 
'pyarrow.lib.BinaryValue' object
{code}

So due to some changes (I think a bit buried in the Column removal PR: 
https://github.com/apache/arrow/pull/4841), we started calling {{pa.array}} on 
the values in the dict, but, we do not yet accept actual pyarrow arrays in 
{{pa.array}} (see ARROW-5295). And apparently we did not yet have the case of a 
dict of arrow arrays + specifying a schema covered in our tests.


> [Python] Segmentation fault on writing tables with fixed size binary fields 
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-6520
>                 URL: https://issues.apache.org/jira/browse/ARROW-6520
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>         Environment: python(3.7.3), pyarrow(0.14.1), arrow-cpp(0.14.1), 
> parquet-cpp(1.5.1), Arch Linux x86_64
>            Reporter: Furkan Tektas
>            Priority: Critical
>              Labels: newbie
>             Fix For: 0.15.0
>
>
> I'm not sure if this should be reported to Parquet or here.
> When I tried to serialize a pyarrow table with a fixed size binary field 
> (holds 16 byte UUID4 information) to a parquet file, segmentation fault 
> occurs.
> Here is the minimal example to reproduce:
> {{import pyarrow as pa}}
> {{from pyarrow import parquet as pq}}
> {{data = \{"col": pa.array([b"1234" for _ in range(10)])}}}
> {{fields = [("col", pa.binary(4))]}}
> {{schema = pa.schema(fields)}}
> {{table = pa.table(data, schema)}}
> {{pq.write_table(table, "test.parquet")}}
> {{segmentation fault (core dumped) ipython}}
>  
> Yet, it works if I don't specify the size of the binary field.
> {{import pyarrow as pa}}
> {{from pyarrow import parquet as pq}}
> {{data = \{"col": pa.array([b"1234" for _ in range(10)])}}}
> {{fields = [("col", pa.binary())]}}
> {{schema = pa.schema(fields)}}
> {{table = pa.table(data, schema)}}
> {{pq.write_table(table, "test.parquet")}}
> Thanks,



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to