[
https://issues.apache.org/jira/browse/HADOOP-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198868#comment-14198868
]
Venkata Puneet Ravuri commented on HADOOP-11270:
------------------------------------------------
Thanks for your inputs!
[[email protected]], my responses:
1. I am currently using Hadoop 2.5.1.
2. I am trying seek(len(file)).
3. No, the file size is more than 1MB.
I understand that behavior across file systems can be different. But I believe
seek(<length of file>) should be supported by s3n as well.
I have noticed that seek() method in NativeS3FsInputStream creates a new input
stream by performing a getObject() starting from seek position. This fails when
seek position is length of file. Instead we could do this:-
a. If the new seek position is greater than the current position of the stream,
skip the difference in the underlying input stream.
b. If the new seek position is less than the current position of the stream,
get a new input stream starting from this position.
I tested this change and its working. Please let me know your thoughts on this.
One impact of current behavior is that Hive reads for RCFiles stored in S3 fail
when it tries to skip columns by issuing skipBytes() on this input stream.
> Seek behavior difference between NativeS3FsInputStream and DFSInputStream
> -------------------------------------------------------------------------
>
> Key: HADOOP-11270
> URL: https://issues.apache.org/jira/browse/HADOOP-11270
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Venkata Puneet Ravuri
> Assignee: Venkata Puneet Ravuri
>
> There is a difference in behavior while seeking a given file present
> in S3 using NativeS3FileSystem$NativeS3FsInputStream and a file present in
> HDFS using DFSInputStream.
> If we seek to the end of the file incase of NativeS3FsInputStream, it fails
> with exception "java.io.EOFException: Attempted to seek or read past the end
> of the file". That is because a getObject request is issued on the S3 object
> with range start as value of length of file.
> This is the complete exception stack:-
> Caused by: java.io.EOFException: Attempted to seek or read past the end of
> the file
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
> at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
> at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
> ... 15 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)