[GitHub] [arrow-datafusion] sergiimk opened a new issue #936: Schma merging check is too strict

GitBox Mon, 23 Aug 2021 19:26:57 -0700


sergiimk opened a new issue #936:
URL: https://github.com/apache/arrow-datafusion/issues/936



   **Describe the bug**
   I am attempting to read two parquet files (produced by Spark) as a single 
table and getting error:
   ```
   Error during planning: The Parquet files have 2 different schemas and 
DataFusion does not yet support schema merging
   ```
   
   Inspecting both files with `parquet-schema` shows identical schemas:
   ```
   message spark_schema {
     required int96 system_time;
     optional int32 date (DATE);
     optional int64 usd_am (DECIMAL(18,4));
     optional int64 usd_pm (DECIMAL(18,4));
     optional int64 gbp_am (DECIMAL(18,4));
     optional int64 gbp_pm (DECIMAL(18,4));
     optional int64 euro_am (DECIMAL(18,4));
     optional int64 euro_pm (DECIMAL(18,4));
   }
   ```
   
   Inspecting with `parquet-meta` seems to show only one relevant difference:
   * First file has `extra:       org.apache.spark.version = 3.0.1`
   * Second file has `extra:       org.apache.spark.version = 3.1.2` 
   
   **To Reproduce**
   Place attached files in a directory and use `ExecutionContext::read_parquet` 
API.
   
[example.tar.gz](https://github.com/apache/arrow-datafusion/files/7036002/example.tar.gz)
   
   **Expected behavior**
   Insignificant differences in metadata are ignored and true cases that 
require schema merging are reported.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] sergiimk opened a new issue #936: Schma merging check is too strict

Reply via email to