[ 
https://issues.apache.org/jira/browse/ARROW-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497532#comment-17497532
 ] 

Joris Van den Bossche commented on ARROW-15723:
-----------------------------------------------

Thanks for the report. There are potentially multiple issues here.

First, writing null arrays is not actually supported (yet). When using the 
OrcWriter API directly, we can see this (using the table from the code snippet 
above):

{code}
In [3]: writer = orc.ORCWriter("test.orc")

In [4]: writer.write(table)
...
ArrowNotImplementedError: Unknown or unsupported Arrow type: null
../src/arrow/adapters/orc/util.cc:1062  GetOrcType(*arrow_child_type)
../src/arrow/adapters/orc/adapter.cc:811  GetOrcType(*(table.schema()))
{code}

But, it seems that for some reason this error is not bubbled up when using 
{{write_table}} (which uses this ORCWriter in a context manager).

Then, it further seems that the segfault comes from trying to write (close) an 
empty file. This can be reproduced with the following as well:

{code}
In [1]: from pyarrow import orc

In [2]: writer = orc.ORCWriter("test.orc")

In [3]: writer.close()
Segmentation fault (core dumped)
{code}

> [Python] Segfault  orcWriter write table
> ----------------------------------------
>
>                 Key: ARROW-15723
>                 URL: https://issues.apache.org/jira/browse/ARROW-15723
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>            Reporter: patrice
>            Priority: Major
>
> pyarrow segfault when trying to write an orc from a table containing 
> nullArray.
>  
> from pyarrow import orc
> import pyarrow as pa
> a = pa.array([1, None, 3, None])
> b = pa.array([None, None, None, None])
> table = pa.table(\{"int64": a, "utf8": b})
> orc.write_table(table, 'test.orc')
> zsh: segmentation fault  python3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to