[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

Micah Kornfield (Jira) Fri, 06 Jan 2023 11:00:19 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655552#comment-17655552
 ]


Micah Kornfield commented on PARQUET-2219:
------------------------------------------

I'm not aware of anything in the specification that prevents zero length row 
groups.  We can try to prevent writing them out but I think readers should be 
robust to this if it isn't disallowed in the specification.   For the iterator 
case, it seems like the rowgroup should just be discarded and the next one 
checked?

> ParquetFileReader throws a runtime exception when a file contains only 
> headers and now row data
> -----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2219
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2219
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.1
>            Reporter: chris stockton
>            Priority: Minor
>
> Google BigQuery has an option to export table data to Parquet-formatted 
> files, but some of these files are written with header data only.  When this 
> happens and these files are opened with the ParquetFileReader, an exception 
> is thrown:
> {{RuntimeException("Illegal row group of 0 rows");}}
> It seems like the ParquetFileReader should not throw an exception when it 
> encounters such a file.
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L949



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

Reply via email to