David Li created ARROW-18229:
--------------------------------

             Summary: [C++][Python] RecordBatchReader can be created with a 
'dict' schema which then crashes on use
                 Key: ARROW-18229
                 URL: https://issues.apache.org/jira/browse/ARROW-18229
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 10.0.0
            Reporter: David Li


Presumably we should disallow this or convert it to a schema?

https://github.com/duckdb/duckdb/issues/5143

{noformat}
>>> import pyarrow as pa
>>> pa.__version__
'10.0.0'
>>> reader = pa.RecordBatchReader.from_batches({"a": pa.int8()}, [])
>>> reader.schema
fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error)

(gdb) bt
#0  0x00007ffff4247580 in arrow::Schema::num_fields() const ()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#1  0x00007ffff42b93f7 in arrow::(anonymous namespace)::SchemaPrinter::Print()
    ()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#2  0x00007ffff42b98a7 in arrow::PrettyPrint(arrow::Schema const&, 
arrow::PrettyPrintOptions const&, std::string*) ()
   from 
/home/lidavidm/miniconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1000
#3  0x00007ffff64f814b in __pyx_pw_7pyarrow_3lib_6Schema_52to_string(_object*, 
_object*, _object*) ()
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to