[ https://issues.apache.org/jira/browse/ARROW-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908642#comment-16908642 ]
Wes McKinney commented on ARROW-6038: ------------------------------------- I confirmed that the MWE is behaving properly now {code} $ python ~/Downloads/segfault_ex.py Creating table Traceback (most recent call last): File "/home/wesm/Downloads/segfault_ex.py", line 11, in <module> pa.RecordBatch.from_arrays([pa.array(["C", "C", "C"])], schema), File "pyarrow/table.pxi", line 1117, in pyarrow.lib.Table.from_batches return pyarrow_wrap_table(c_table) File "pyarrow/public-api.pxi", line 316, in pyarrow.lib.pyarrow_wrap_table check_status(ctable.get().Validate()) File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status raise ArrowInvalid(message) pyarrow.lib.ArrowInvalid: Column 0: In chunk 1 expected type string but saw null {code} This is still weird and dangerous though: {code} In [4]: pa.RecordBatch.from_arrays([pa.array([])], schema) Out[4]: <pyarrow.lib.RecordBatch at 0x7fc36fa18db8> In [5]: rb = pa.RecordBatch.from_arrays([pa.array([])], schema) In [6]: rb Out[6]: <pyarrow.lib.RecordBatch at 0x7fc37d9c69f8> In [7]: rb.schema Out[7]: col: string In [8]: rb[0] Out[8]: <pyarrow.lib.NullArray object at 0x7fc36fa8ce08> 0 nulls {code} I opened ARROW-6263 > [Python] pyarrow.Table.from_batches produces corrupted table if any of the > batches were empty > --------------------------------------------------------------------------------------------- > > Key: ARROW-6038 > URL: https://issues.apache.org/jira/browse/ARROW-6038 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.13.0, 0.14.0, 0.14.1 > Reporter: Piotr Bajger > Assignee: Antoine Pitrou > Priority: Minor > Labels: pull-request-available, windows > Fix For: 0.15.0 > > Attachments: segfault_ex.py > > Time Spent: 50m > Remaining Estimate: 0h > > When creating a Table from a list/iterator of batches which contains an > "empty" RecordBatch a Table is produced but attempts to run any pyarrow > built-in functions (such as unique()) occasionally result in a Segfault. > The MWE is attached: [^segfault_ex.py] > # The segfaults happen randomly, around 30% of the time. > # Commenting out line 10 in the MWE results in no segfaults. > # The segfault is triggered using the unique() function, but I doubt the > behaviour is specific to that function, from what I gather the problem lies > in Table creation. > I'm on Windows 10, using Python 3.6 and pyarrow 0.14.0 installed through pip > (problem also occurs with 0.13.0 from conda-forge). -- This message was sent by Atlassian JIRA (v7.6.14#76016)