[ https://issues.apache.org/jira/browse/ARROW-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-8694. --------------------------------- Resolution: Fixed Issue resolved by pull request 7103 [https://github.com/apache/arrow/pull/7103] > [Python][Parquet] parquet.read_schema() fails when loading wide table created > from Pandas DataFrame > --------------------------------------------------------------------------------------------------- > > Key: ARROW-8694 > URL: https://issues.apache.org/jira/browse/ARROW-8694 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.17.0 > Environment: Linux OS with RHEL 7.7 distribution > Reporter: Eric Kisslinger > Assignee: Wes McKinney > Priority: Critical > Labels: pull-request-available > Fix For: 0.17.1, 1.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > parquet.read_schema() fails when loading wide table schema created from > Pandas DataFrame with 50,000 columns. This works ok using pyarrow 0.16.0. > {code:java} > import numpy as np > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > print(pa.__version__) > df = pd.DataFrame(({'c' + str(i): np.random.randn(10) for i in range(50000)})) > table = pa.Table.from_pandas(df) > pq.write_table(table, "test_wide.parquet") > schema = pq.read_schema('test_wide.parquet'){code} > Output: > 0.17.0 > Traceback (most recent call last): > File > "/GAAL/kisseri/conda_envs/blkmamba-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", > line 3319, in run_code > exec(code_obj, self.user_global_ns, self.user_ns) > File "<ipython-input-29-d5ef2df77263>", line 9, in <module> > table = pq.read_schema('test_wide.parquet') > File > "/GAAL/kisseri/conda_envs/blkmamba-dev/lib/python3.6/site-packages/pyarrow/parquet.py", > line 1793, in read_schema > return ParquetFile(where, memory_map=memory_map).schema.to_arrow_schema() > File > "/GAAL/kisseri/conda_envs/blkmamba-dev/lib/python3.6/site-packages/pyarrow/parquet.py", > line 210, in __init__ > read_dictionary=read_dictionary, metadata=metadata) > File "pyarrow/_parquet.pyx", line 1023, in > pyarrow._parquet.ParquetReader.open > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > OSError: Couldn't deserialize thrift: TProtocolException: Exceeded size limit > -- This message was sent by Atlassian Jira (v8.3.4#803005)