[
https://issues.apache.org/jira/browse/ARROW-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Bajger updated ARROW-6038:
--------------------------------
Description:
When creating a Table from an list/iterator of batches which contains an
"empty" RecordBatch a Table is produced but attempts to run any pyarrow
built-in functions (such as unique()) occasionally result in a Segfault.
The MWE is attached: [^segfault_ex.py]
# The segfaults happen randomly, around 30% of the time.
# Commenting out line 10 in the MWE results in no segfaults.
# The segfault is triggered using the unique() function, but I doubt the
behaviour is specific to that function, from what I gather the problem lies in
Table creation.
I'm on Windows 10, using Python 3.6 and pyarrow 0.14.0 installed through pip
(problem also occurs with 0.13.0 from conda-forge).
was:
When creating a Table from an list/iterator of batches which contains an
"empty" RecordBatch a Table is produced but attempts to run any pyarrow
built-in functions (such as unique()) occasionally result in a Segfault.
The MWE is attached: [^segfault_ex.py]
# The segfaults happen randomly, around 30% of the time.
# Commenting out line 10 in the MWE results in no segfaults.
# The segfault is triggered using the unique() function, but I doubt the
behaviour is specific to that function, from what I gather the problem lies in
Table creation.
I'm on Windows 10, using Python 3.6 and pyarrow 0.13.0 (py36h8c67754_1) from
conda-forge.
> [Python] pyarrow.Table.from_batches produces corrupted table if any of the
> batches were empty
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-6038
> URL: https://issues.apache.org/jira/browse/ARROW-6038
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 0.13.0, 0.14.0
> Reporter: Piotr Bajger
> Priority: Minor
> Labels: windows
> Attachments: segfault_ex.py
>
>
> When creating a Table from an list/iterator of batches which contains an
> "empty" RecordBatch a Table is produced but attempts to run any pyarrow
> built-in functions (such as unique()) occasionally result in a Segfault.
> The MWE is attached: [^segfault_ex.py]
> # The segfaults happen randomly, around 30% of the time.
> # Commenting out line 10 in the MWE results in no segfaults.
> # The segfault is triggered using the unique() function, but I doubt the
> behaviour is specific to that function, from what I gather the problem lies
> in Table creation.
> I'm on Windows 10, using Python 3.6 and pyarrow 0.14.0 installed through pip
> (problem also occurs with 0.13.0 from conda-forge).
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)