[ https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430261#comment-16430261 ]
Dave Challis commented on ARROW-2406: ------------------------------------- [~kszucs] My mistake, retested and noticed I was using an older env with pyarrow 0.8.0, looks like the issue was resolved in 0.9.0. > [Python] Segfault when creating PyArrow table from Pandas for empty string > column when schema provided > ------------------------------------------------------------------------------------------------------ > > Key: ARROW-2406 > URL: https://issues.apache.org/jira/browse/ARROW-2406 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Environment: Mac OS High Sierra > Python 3.6.3 > Reporter: Dave Challis > Priority: Major > Fix For: 0.9.0 > > > Minimal example to recreate: > {code} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame({'a': []}) > df['a'] = df['a'].astype(str) > schema = pa.schema([pa.field('a', pa.string())]) > pa.Table.from_pandas(df, schema=schema){code} > > This causes the python interpreter to exit with "Segmentation fault: 11". > The following examples all work without any issue: > {code} > # column 'a' is no longer empty > df = pd.DataFrame({'a': ['foo']}) > df['a'] = df['a'].astype(str) > schema = pa.schema([pa.field('a', pa.string())]) > pa.Table.from_pandas(df, schema=schema) > {code} > {code} > # column 'a' is empty, but no schema is specified > df = pd.DataFrame({'a': []}) > df['a'] = df['a'].astype(str) > pa.Table.from_pandas(df) > {code} > {code} > # column 'a' is empty, but no type 'str' specified in Pandas > df = pd.DataFrame({'a': []}) > schema = pa.schema([pa.field('a', pa.string())]) > pa.Table.from_pandas(df, schema=schema) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)