[ 
https://issues.apache.org/jira/browse/ARROW-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908642#comment-16908642
 ] 

Wes McKinney commented on ARROW-6038:
-------------------------------------

I confirmed that the MWE is behaving properly now

{code}
$ python ~/Downloads/segfault_ex.py 
Creating table
Traceback (most recent call last):
  File "/home/wesm/Downloads/segfault_ex.py", line 11, in <module>
    pa.RecordBatch.from_arrays([pa.array(["C", "C", "C"])], schema),
  File "pyarrow/table.pxi", line 1117, in pyarrow.lib.Table.from_batches
    return pyarrow_wrap_table(c_table)
  File "pyarrow/public-api.pxi", line 316, in pyarrow.lib.pyarrow_wrap_table
    check_status(ctable.get().Validate())
  File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
    raise ArrowInvalid(message)
pyarrow.lib.ArrowInvalid: Column 0: In chunk 1 expected type string but saw null
{code}

This is still weird and dangerous though:

{code}
In [4]: pa.RecordBatch.from_arrays([pa.array([])], schema)                      
                            
Out[4]: <pyarrow.lib.RecordBatch at 0x7fc36fa18db8>

In [5]: rb = pa.RecordBatch.from_arrays([pa.array([])], schema)                 
                            

In [6]: rb                                                                      
                            
Out[6]: <pyarrow.lib.RecordBatch at 0x7fc37d9c69f8>

In [7]: rb.schema                                                               
                            
Out[7]: col: string

In [8]: rb[0]                                                                   
                            
Out[8]: 
<pyarrow.lib.NullArray object at 0x7fc36fa8ce08>
0 nulls
{code}

I opened ARROW-6263

> [Python] pyarrow.Table.from_batches produces corrupted table if any of the 
> batches were empty
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6038
>                 URL: https://issues.apache.org/jira/browse/ARROW-6038
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.13.0, 0.14.0, 0.14.1
>            Reporter: Piotr Bajger
>            Assignee: Antoine Pitrou
>            Priority: Minor
>              Labels: pull-request-available, windows
>             Fix For: 0.15.0
>
>         Attachments: segfault_ex.py
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> When creating a Table from a list/iterator of batches which contains an 
> "empty" RecordBatch a Table is produced but attempts to run any pyarrow 
> built-in functions (such as unique()) occasionally result in a Segfault.
> The MWE is attached: [^segfault_ex.py]
>  # The segfaults happen randomly, around 30% of the time.
>  # Commenting out line 10 in the MWE results in no segfaults.
>  # The segfault is triggered using the unique() function, but I doubt the 
> behaviour is specific to that function, from what I gather the problem lies 
> in Table creation.
> I'm on Windows 10, using Python 3.6 and pyarrow 0.14.0 installed through pip 
> (problem also occurs with 0.13.0 from conda-forge).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to