[jira] [Commented] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

ASF GitHub Bot (Jira) Wed, 20 Jan 2021 08:03:08 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268662#comment-17268662
 ]


ASF GitHub Bot commented on PARQUET-1963:
-----------------------------------------

gszadovszky merged pull request #854:
URL: https://github.com/apache/parquet-mr/pull/854


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the 
> first sub-split is empty
> --------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1963
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1963
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>            Priority: Major
>
> A followup of PARQUET-1947, after the fix, when the first sub-split is empty 
> in CombineFileInputFormat, there's a NPE:
> {code}
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154)
>       at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73)
>       at 
> cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
>       at 
> org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
>       at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>       at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>       at 
> cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
>       at 
> org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160)
>       at 
> cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
>       at 
> cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
>       ... 10 more
> {code}
> The reason is CombineFileInputFormat will use the result of createValue of 
> the first sub-split as the value container. Since the first sub-split is 
> empty, the value container is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

Reply via email to