[
https://issues.apache.org/jira/browse/ARROW-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nolo Ogbirner updated ARROW-9818:
---------------------------------
Summary: Obscure C++ Error when Calling to_pandas on a RecordBatch (was:
Obscure C++ Error when Callign to_pandas on a RecordBatch)
> Obscure C++ Error when Calling to_pandas on a RecordBatch
> ---------------------------------------------------------
>
> Key: ARROW-9818
> URL: https://issues.apache.org/jira/browse/ARROW-9818
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 1.0.0
> Environment: AWS Lambda with pyarrow 1.0.0
> Reporter: Nolo Ogbirner
> Priority: Critical
>
> I'm using Pyarrow to stream a CSV from an input over HTTP and then converting
> each RecordBatch to a Pandas DataFrame for manipulation. For testing, I'm
> using the NYPD Motor Vehicle Collisions Open source dataset. However, for
> anything above the 5MB file e.g. 1GB, 240MB, my code that is running in an
> AWS Lambda is failing with a RuntimeError because of
> terminate called after throwing an instance of 'std::logic_error'
> what(): basic_string::_S_construct null not valid
> after calling to_pandas() on the first batch. Why is this happening? How can
> I fix it?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)