Dave Challis created ARROW-2406: ----------------------------------- Summary: [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided Key: ARROW-2406 URL: https://issues.apache.org/jira/browse/ARROW-2406 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Environment: Mac OS High Sierra Python 3.6.3 Reporter: Dave Challis
Minimal example to recreate: {code:python} import pandas as pd import pyarrow as pa df = pd.DataFrame({'a': []}) df['a'] = df['a'].astype(str) schema = pa.schema([pa.field('a', pa.string())]) pa.Table.from_pandas(df, schema=schema){code} This causes the python interpreter to exit with "Segmentation fault: 11". The following examples all work without any issue: {code:python} # column 'a' is no longer empty df = pd.DataFrame({'a': ['foo']}) df['a'] = df['a'].astype(str) schema = pa.schema([pa.field('a', pa.string())]) pa.Table.from_pandas(df, schema=schema) {code} {code:python} # column 'a' is empty, but no schema is specified df = pd.DataFrame({'a': []}) df['a'] = df['a'].astype(str) pa.Table.from_pandas(df) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)