[jira] [Closed] (ARROW-3956) [Python] ParquetWriter.write_table isn't working

Wes McKinney (JIRA) Fri, 07 Dec 2018 14:12:38 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wes McKinney closed ARROW-3956.
-------------------------------
    Resolution: Duplicate

This was resolved in 
https://github.com/apache/arrow/commit/10b204ec2532d8e30be157bcfd3af53d41f42ffb.
 I verified that the issue is not present on the master branch

> [Python] ParquetWriter.write_table isn't working
> ------------------------------------------------
>
>                 Key: ARROW-3956
>                 URL: https://issues.apache.org/jira/browse/ARROW-3956
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: David Lee
>            Priority: Major
>
> ParquetWriter.write_table is erroring out on table schema doesn't match file 
> schema, but it does match.
>  
> Error:
> {code:java}
> >>> writer.write_table(arrow_table)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "../lib/python3.6/site-packages/pyarrow/parquet.py", line 374, in 
> write_table
> raise ValueError(msg)
> ValueError: Table schema does not match schema used to create file:
> table:
> col1: int64
> col2: int64
> metadata
> --------
> {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": 
> [{"name":'
> b' "col1", "field_name": "col1", "pandas_type": "int64", "numpy_ty'
> b'pe": "int64", "metadata": null}, {"name": "col2", "field_name": '
> b'"col2", "pandas_type": "int64", "numpy_type": "int64", "metadata'
> b'": null}], "pandas_version": "0.23.4"}'} vs.
> file:
> col1: int64
> col2: int64
> {code}
> Test Script:
> {code:java}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> d = {'col1': [1, 2], 'col2': [3, 4]}
> df = pd.DataFrame(data=d)
> arrow_table = pa.Table.from_pandas(df, preserve_index=False)
> arrow_table
> pq.write_table(arrow_table, "test.parquet")
> test_schema = pa.schema([
> pa.field('col1', pa.int64()),
> pa.field('col2', pa.int64())
> ])
> writer = pq.ParquetWriter("test2.parquet", use_dictionary=True, schema = 
> test_schema, compression='snappy')
> writer.write_table(arrow_table)
> writer.close()
> {code}
> write_table() works, but ParquetWriter.write_table does not..
> I think something is wrong with the schema object.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3956) [Python] ParquetWriter.write_table isn't working

Reply via email to