Dave Challis created DRILL-7735: ----------------------------------- Summary: Query against empty parquet file fails with: IndexOutOfBoundsException: Index: 0, Size: 0 Key: DRILL-7735 URL: https://issues.apache.org/jira/browse/DRILL-7735 Project: Apache Drill Issue Type: Bug Components: Server, Storage - Parquet Affects Versions: 1.17.0 Environment: 64Gb machine running on AWS. Reporter: Dave Challis Attachments: dispute.parquet, drillbit.log
Running a `SELECT *` query against an empty Parquet file (i.e. one with correct column metadata written, but no rows) triggers an `IndexOutOfBoundsException`. I've got an empty parquet file with the following schema: {noformat} $ parquet-tools schema dispute.parquet message parquet_go_root { required int32 dispute_id (INT_32) = 0; required binary title (UTF8) = 0; optional int32 start_date (DATE) = 0; optional int32 end_date (DATE) = 0; optional binary docket_number (UTF8) = 0; required binary route (UTF8) = 0; required binary jurisdiction (UTF8) = 0; } {noformat} If I then run the following query via the Drill web UI: {noformat} SELECT * FROM dfs.`/data/dispute.parquet` {noformat} then I get the following error from Drill: {noformat} org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: Index: 0, Size: 0 Please, refer to logs for more information. [Error Id: a93e1aa1-a7e6-4bc9-9f11-c42b9f6fe108 on e531a6492cf4:31010] {noformat} Expected result was just to get an empty result set (i.e. 0 rows). I've attached the parquet file in question, and the relevant entries from the drillbit.log. -- This message was sent by Atlassian Jira (v8.3.4#803005)