[
https://issues.apache.org/jira/browse/HADOOP-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198066#comment-14198066
]
Steve Loughran commented on HADOOP-11270:
-----------------------------------------
# which version of Hadoop is this; Hadoop 2.4 had a broken seek
# are you trying to do {{seek(len(file))}} or {{seek(len(file)-1)}}?
# is the file 0 bytes long.
I would recommend you test on Hadoop 2.5.1; the semantics of filesystem access
has change, as Thomas pointed out. And {{seek()}} turned out to be the most
inconsistent of them (exceptions, actions on a negative value, seeking on the
current position, etc.)
Finally, yes, filesystems are different and that's a WONTFIX. Example: a native
fileystem doesn't raise any exception on the seek, only on the following
read(). HDFS and others do fail fast on the seek. Which is why I'm surprised
you are seeing a difference between HDFS and S3N; both are going to reject on
the seek()
> Seek behavior difference between NativeS3FsInputStream and DFSInputStream
> -------------------------------------------------------------------------
>
> Key: HADOOP-11270
> URL: https://issues.apache.org/jira/browse/HADOOP-11270
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Venkata Puneet Ravuri
> Assignee: Venkata Puneet Ravuri
>
> There is a difference in behavior while seeking a given file present
> in S3 using NativeS3FileSystem$NativeS3FsInputStream and a file present in
> HDFS using DFSInputStream.
> If we seek to the end of the file incase of NativeS3FsInputStream, it fails
> with exception "java.io.EOFException: Attempted to seek or read past the end
> of the file". That is because a getObject request is issued on the S3 object
> with range start as value of length of file.
> This is the complete exception stack:-
> Caused by: java.io.EOFException: Attempted to seek or read past the end of
> the file
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
> at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
> at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
> at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
> ... 15 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)