[
https://issues.apache.org/jira/browse/PARQUET-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268907#comment-17268907
]
Daniel Dai commented on PARQUET-1963:
-------------------------------------
Thanks [~gszadovszky]!
> DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the
> first sub-split is empty
> --------------------------------------------------------------------------------------------------
>
> Key: PARQUET-1963
> URL: https://issues.apache.org/jira/browse/PARQUET-1963
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Priority: Major
>
> A followup of PARQUET-1947, after the fix, when the first sub-split is empty
> in CombineFileInputFormat, there's a NPE:
> {code}
> Caused by: java.lang.NullPointerException
> at
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154)
> at
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73)
> at
> cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
> at
> org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
> at
> cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
> at
> org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160)
> at
> cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
> at
> cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
> ... 10 more
> {code}
> The reason is CombineFileInputFormat will use the result of createValue of
> the first sub-split as the value container. Since the first sub-split is
> empty, the value container is null.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)