[
https://issues.apache.org/jira/browse/ARROW-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarno Seppanen updated ARROW-1440:
----------------------------------
Description:
Reading the attached parquet file into pandas dataframe and then using the
dataframe segfaults.
{noformat}
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pyarrow
>>> import pyarrow.parquet as pq
>>> pyarrow.__version__
'0.6.0'
>>> import pandas as pd
>>> pd.__version__
'0.19.0'
>>> df =
>>> pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet')
>>> \
... .to_pandas()
>>> len(df)
69
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 6 columns):
label 69 non-null int32
account_meta 69 non-null object
features_type 69 non-null int32
features_size 69 non-null int32
features_indices 1 non-null object
features_values 1 non-null object
dtypes: int32(3), object(3)
memory usage: 2.5+ KB
>>>
>>> pd.concat([df, df])
Segmentation fault (core dumped)
{noformat}
Actually just print(df) is enough to trigger the segfault
was:
Reading the attached parquet file into pandas dataframe and then inspecting the
dataframe segfaults.
{noformat}
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pyarrow
>>> import pyarrow.parquet as pq
>>> pyarrow.__version__
'0.6.0'
>>> df =
>>> pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet')
>>> \
... .to_pandas()
>>> len(df)
69
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 6 columns):
label 69 non-null int32
account_meta 69 non-null object
features_type 69 non-null int32
features_size 69 non-null int32
features_indices 1 non-null object
features_values 1 non-null object
dtypes: int32(3), object(3)
memory usage: 2.5+ KB
>>>
>>> print(df)
Segmentation fault (core dumped)
{noformat}
> Segmentation fault after loading parquet file to pandas dataframe
> -----------------------------------------------------------------
>
> Key: ARROW-1440
> URL: https://issues.apache.org/jira/browse/ARROW-1440
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.6.0
> Environment: ubuntu 16.04.2
> Reporter: Jarno Seppanen
> Attachments:
> part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet
>
>
> Reading the attached parquet file into pandas dataframe and then using the
> dataframe segfaults.
> {noformat}
> Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>>
> >>> import pyarrow
> >>> import pyarrow.parquet as pq
> >>> pyarrow.__version__
> '0.6.0'
> >>> import pandas as pd
> >>> pd.__version__
> '0.19.0'
> >>> df =
> >>> pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet')
> >>> \
> ... .to_pandas()
> >>> len(df)
> 69
> >>> df.info()
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 69 entries, 0 to 68
> Data columns (total 6 columns):
> label 69 non-null int32
> account_meta 69 non-null object
> features_type 69 non-null int32
> features_size 69 non-null int32
> features_indices 1 non-null object
> features_values 1 non-null object
> dtypes: int32(3), object(3)
> memory usage: 2.5+ KB
> >>>
> >>> pd.concat([df, df])
> Segmentation fault (core dumped)
> {noformat}
> Actually just print(df) is enough to trigger the segfault
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)