[
https://issues.apache.org/jira/browse/ARROW-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704034#comment-16704034
]
Wes McKinney commented on ARROW-3728:
-------------------------------------
There's a number of possible feature requests and other design considerations
here. Can you open some other JIRA issues or start a mailing list discussion?
> [Python] Merging Parquet Files - Pandas Meta in Schema Mismatch
> ---------------------------------------------------------------
>
> Key: ARROW-3728
> URL: https://issues.apache.org/jira/browse/ARROW-3728
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.10.0, 0.11.0, 0.11.1
> Environment: Python 3.6.3
> OSX 10.14
> Reporter: Micah Williamson
> Assignee: Krisztian Szucs
> Priority: Major
> Labels: parquet, pull-request-available
> Fix For: 0.12.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> From:
> https://stackoverflow.com/questions/53214288/merging-parquet-files-pandas-meta-in-schema-mismatch
>
> I am trying to merge multiple parquet files into one. Their schemas are
> identical field-wise but my {{ParquetWriter}} is complaining that they are
> not. After some investigation I found that the pandas meta in the schemas are
> different, causing this error.
>
> Sample-
> {code:python}
> import pyarrow.parquet as pq
> pq_tables=[]
> for file_ in files:
> pq_table = pq.read_table(f'{MESS_DIR}/{file_}')
> pq_tables.append(pq_table)
> if writer is None:
> writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema,
> use_deprecated_int96_timestamps=True)
> writer.write_table(table=pq_table)
> {code}
> The error-
> {code}
> Traceback (most recent call last):
> File "{PATH_TO}/main.py", line 68, in lambda_handler
> writer.write_table(table=pq_table)
> File
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py",
> line 335, in write_table
> raise ValueError(msg)
> ValueError: Table schema does not match schema used to create file:
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)