Chris Ellison created ARROW-2227: ------------------------------------ Summary: Table.from_pandas does not create chunked_arrays. Key: ARROW-2227 URL: https://issues.apache.org/jira/browse/ARROW-2227 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Reporter: Chris Ellison
When creating a large enough array, pyarrow raises an exception: {code:java} import numpy as np import pandas as pd import pyarrow as pa x = list('1' * 2**31) y = pd.DataFrame({'x': x}) t = pa.Table.from_pandas(y) # ArrowInvalid: BinaryArrow cannot contain more than 2147483646 bytes, have 2147483647{code} The array should be chunked for the user. As is, data frames with >2 GiB in binary data will struggle to get into arrow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)