[
https://issues.apache.org/jira/browse/HADOOP-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313851#comment-14313851
]
Steve Loughran commented on HADOOP-11202:
-----------------------------------------
Corby: if you are using Amazon's own S3 client then know that it is not the
Apache one; at some point they forked
> SequenceFile crashes with encrypted files that are shorter than
> FileSystem.getStatus(path)
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-11202
> URL: https://issues.apache.org/jira/browse/HADOOP-11202
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.2.0
> Environment: Amazon EMR 3.0.4
> Reporter: Corby Wilson
>
> Encrypted files are often padded to allow for proper encryption on a 2^n-bit
> boundary. As a result, the encrypted file might be a few bytes bigger than
> the unencrypted file.
> We have a case where an encrypted files is 2 bytes bigger due to padding.
> When we run a HIVE job on the file to get a record count (select count(*)
> from <table>) it runs org.apache.hadoop.mapred.SequenceFileRecordReader and
> loads the file in through a custom FS InputStream.
> The InputStream decrypts the file as it gets read in. Splits are properly
> handled as it extends both Seekable and Positioned Readable.
> When the org.apache.hadoop.io.SequenceFile class intializes it reads in the
> file size from the FileMetadata which returns the file size of the encrypted
> file on disk (or in this case in S3).
> However, the actual file size is 2 bytes less, so the InputStream will return
> EOF (-1) before the SequenceFile thinks it's done.
> As a result, the SequenceFile$Reader tried to run the next->readRecordLength
> after the file has been closed and we get a crash.
> The SequenceFile class SHOULD, instead, pay attention to the EOF marker from
> the stream instead of the file size reported in the metadata and set the
> 'more' flag accordingly.
> Sample stack dump from crash
> 2014-10-10 21:25:27,160 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.IOException: java.io.IOException:
> java.io.EOFException
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.io.EOFException
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2332)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2363)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2500)
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more
> Sample stack dump:
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)