[
https://issues.apache.org/jira/browse/HADOOP-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087169#comment-15087169
]
Steve Loughran commented on HADOOP-12689:
-----------------------------------------
-1
Ravi. please, no: not without new tests to show this problem is fixed. if the
last fix was a regression, then we need something more to show the regression
has gone away. That doesn't have to be a general purpose contract test,
something in in hadoop-tools/hadoop-aws package.
look at the jenkins
{code}
The patch doesn't appear to include any new or modified tests. Please justify
why no new tests are needed for this patch. Also please list what manual steps
were performed to verify this patch.
{code}
S3n and siblings are burning sore in the Hadoop codebase: undermaintained,
undertested and incredibly brittle to change.
If HADOOP-10542 did break things —and I trust your claim there— then it slipped
through the current s3 test suite. We need another test to make sure this
problem never comes back. It doesn't have to be a full contract test, something
in hadoop-aws will be enough. But saying "we can add a test later" isn't the
right tactic —we both know "later" means "never" in this context. We also need
all the existing s3 tests run to make sure this patch doesn't change anything
else, either.
Can we roll this back and do another iteration of the patch which does include
a test? As mandating the "patches include tests" policy is the only way we can
keep test coverage up, especially on something this brittle
sorry
> S3 filesystem operations stopped working correctly
> --------------------------------------------------
>
> Key: HADOOP-12689
> URL: https://issues.apache.org/jira/browse/HADOOP-12689
> Project: Hadoop Common
> Issue Type: Bug
> Components: tools
> Affects Versions: 2.7.0
> Reporter: Matthew Paduano
> Assignee: Matthew Paduano
> Labels: S3
> Fix For: 2.8.0
>
> Attachments: HADOOP-12689.01.patch
>
>
> HADOOP-10542 was resolved by replacing "return null;" with throwing
> IOException. This causes several S3 filesystem operations to fail (possibly
> more code is expecting that null return value; these are just the calls I
> noticed):
> S3FileSystem.getFileStatus() (which no longer raises FileNotFoundException
> but instead IOException)
> FileSystem.exists() (which no longer returns false but instead raises
> IOException)
> S3FileSystem.create() (which no longer succeeds but instead raises
> IOException)
> Run command:
> hadoop distcp hdfs://localhost:9000/test s3://xxx:[email protected]/
> Resulting stack trace:
> 2015-12-11 10:04:34,030 FATAL [IPC Server handler 6 on 44861]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1449826461866_0005_m_000006_0 - exited : java.io.IOException: /test
> doesn't exist
> at
> org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
> at
> org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy17.retrieveINode(Unknown Source)
> at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:230)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> changing the "raise IOE..." to "return null" fixes all of the above code
> sites and allows distcp to succeed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)