[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

ASF GitHub Bot (Jira) Sun, 08 Jan 2023 01:52:05 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655782#comment-17655782
 ]


ASF GitHub Bot commented on PARQUET-2219:
-----------------------------------------

wgtmac opened a new pull request, #1018:
URL: https://github.com/apache/parquet-mr/pull/1018

   ### Jira
   
   My PR addresses the 
[PARQUET-2219](https://issues.apache.org/jira/browse/PARQUET/PARQUET-2219).
   
   ### Tests
   
   My PR adds the following unit test to read parquet file with empty row group:
   - 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetReaderEmptyBlock.java
   
   ### Commits
   
   The parquet specs does not forbid empty row group and some implementations 
are able to generate files with empty row group. The commit aims to make 
ParquetFileReader robust by skipping empty row group while reading.
   




> ParquetFileReader throws a runtime exception when a file contains only 
> headers and now row data
> -----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2219
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2219
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.1
>            Reporter: chris stockton
>            Priority: Minor
>
> Google BigQuery has an option to export table data to Parquet-formatted 
> files, but some of these files are written with header data only.  When this 
> happens and these files are opened with the ParquetFileReader, an exception 
> is thrown:
> {{RuntimeException("Illegal row group of 0 rows");}}
> It seems like the ParquetFileReader should not throw an exception when it 
> encounters such a file.
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L949



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

Reply via email to