Nolo Ogbirner created ARROW-9818:
------------------------------------
Summary: Obscure C++ Error when Callign to_pandas on a RecordBatch
Key: ARROW-9818
URL: https://issues.apache.org/jira/browse/ARROW-9818
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 1.0.0
Environment: AWS Lambda with pyarrow 1.0.0
Reporter: Nolo Ogbirner
I'm using Pyarrow to stream a CSV from an input over HTTP and then converting
each RecordBatch to a Pandas DataFrame for manipulation. For testing, I'm using
the NYPD Motor Vehicle Collisions Open source dataset. However, for anything
above the 5MB file e.g. 1GB, 240MB, my code that is running in an AWS Lambda is
failing with a RuntimeError because of
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
after calling to_pandas() on the first batch. Why is this happening? How can I
fix it?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)