[ https://issues.apache.org/jira/browse/DRILL-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570333#comment-16570333 ]
Oleksandr Kalinin commented on DRILL-6670: ------------------------------------------ Checking this against changes in DRILL-5797, indeed having sample file would be helpful. > Error in parquet record reader - previously readable file fails to be read in > 1.14 > ---------------------------------------------------------------------------------- > > Key: DRILL-6670 > URL: https://issues.apache.org/jira/browse/DRILL-6670 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.14.0 > Reporter: Dave Challis > Priority: Major > > Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 > and 1.13, but fails to be read with 1.14. > Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following > error message from the Drill web query UI: > {code} > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: > Error in parquet record reader. Message: Failure in setting up reader Parquet > Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional > binary name (UTF8); optional binary creation_parameters (UTF8); optional > int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional > int32 schema_version; } , metadata: {pandas={"index_columns": [], > "column_indexes": [], "columns": [{"name": "name", "field_name": "name", > "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": > "creation_parameters", "field_name": "creation_parameters", "pandas_type": > "unicode", "numpy_type": "object", "metadata": null}, {"name": > "creation_date", "field_name": "creation_date", "pandas_type": "datetime", > "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", > "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", > "metadata": null}, {"name": "schema_version", "field_name": "schema_version", > "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], > "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 > [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, > ColumnMetaData{SNAPPY [creation_parameters] optional binary > creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY > [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], > 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version > [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 > schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: > bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)