[ 
https://issues.apache.org/jira/browse/ARROW-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628423#comment-17628423
 ] 

Joris Van den Bossche commented on ARROW-18229:
-----------------------------------------------

I opened a PR to just ensure the argument has to be a schema (I like the idea 
of allowing a dictionary, but that's something we should then also consider in 
other places, starting with creating a schema in {{pa.schema(..)}}, I think).

It's a bit peculiar that we require a Schema for RecordBatchReader.from_batches 
({{PyRecordBatchReader}}, but then don't actually use that schema for anything 
(except for accessing the {{schema}} attribute of the reader). Since reading 
will work fine in the above example, and also happily returns batches of a 
different schema than the one you specified.

> [C++][Python] RecordBatchReader can be created with a 'dict' schema which 
> then crashes on use
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-18229
>                 URL: https://issues.apache.org/jira/browse/ARROW-18229
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 10.0.0
>            Reporter: David Li
>            Assignee: Joris Van den Bossche
>            Priority: Blocker
>              Labels: pull-request-available, triaged
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Presumably we should disallow this or convert it to a schema?
> https://github.com/duckdb/duckdb/issues/5143
> {noformat}
> >>> import pyarrow as pa
> >>> pa.__version__
> '10.0.0'
> >>> reader = pa.RecordBatchReader.from_batches({"a": pa.int8()}, [])
> >>> reader.schema
> fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error)
> (gdb) bt
> #0  0x00007ffff4247580 in arrow::Schema::num_fields() const ()
>    from 
> /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
> #1  0x00007ffff42b93f7 in arrow::(anonymous namespace)::SchemaPrinter::Print()
>     ()
>    from 
> /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
> #2  0x00007ffff42b98a7 in arrow::PrettyPrint(arrow::Schema const&, 
> arrow::PrettyPrintOptions const&, std::string*) ()
>    from 
> /home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
> #3  0x00007ffff64f814b in 
> __pyx_pw_7pyarrow_3lib_6Schema_52to_string(_object*, _object*, _object*) ()
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to