Shane Kumpf created MAPREDUCE-5740:
--------------------------------------
Summary: Shuffle error when the MiniMRYARNCluster work path
contains special characters
Key: MAPREDUCE-5740
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5740
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Shane Kumpf
Priority: Minor
When running tests that leverage MiniMRYARNCluster a failure occurs during the
jenkins build, however, the tests are successful on local workstations.
The exception found is as follows:
{quote}
2014-01-30 10:59:28,649 ERROR [ShuffleHandler.java:510] Shuffle error :
java.io.IOException: Error Reading IndexFile
at
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
at
org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
at
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:592)
at
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException:
/home/sitebuild/jenkins/workspace/%7Binventory-engineering%7D-snapshot-workflow-%7BS7274%7D/target/Integration-Tests/Integration-Tests-localDir-nm-0_2/usercache/sitebuild/appcache/application_1391108343099_0001/output/attempt_1391108343099_0001_m_000000_0/file.out.index
at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:210)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
at
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
at
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
... 32 more
{quote}
It was found that org.apache.hadoop.mapred.SpillRecord does a toURI on the
indexFileName Path object (line 71). Jenkins uses {} to denote team and branch.
These {} characters are being URL encoded, which causes the
FileNotFoundException during the shuffle phase.
Interestingly, the code snippet is as follows and seems a little strange to be
doing the Path.toUri() so high up in the call:
{code}
public SpillRecord(Path indexFileName, JobConf job, Checksum crc, String
expectedIndexOwner) throws IOException {
final FileSystem rfs = FileSystem.getLocal(job).getRaw();
final FSDataInputStream in =
SecureIOUtils.openFSDataInputStream(new
File(indexFileName.toUri().getRawPath()), expectedIndexOwner, null);
....
}
{code}
and SecureIOUtils creates a Path from the File object (!):
{code}
public static FSDataInputStream openFSDataInputStream(File file,
String expectedOwner, String expectedGroup) throws IOException {
if (!UserGroupInformation.isSecurityEnabled()) {
return rawFilesystem.open('''new Path(file.getAbsolutePath())''');
}
return forceSecureOpenFSDataInputStream(file, expectedOwner, expectedGroup);
}
{code}
The rawFileSystem.open(Path) code, above, is executed by the abstract class
FileSystem that delegates to the child class at runtime, which could be any of:
• ChRootedFileSystem
• ChecksumFileSystem
• DistributedFileSystem
• FtpFileSystem
• WebHdfsFileSystem
• and others
URL escaping makes sense for the WebHdfsFileSystem and some others, but not for
all. It seems to make sense to only URL escape within FileSystem
implementations that require it.
Also of note: MiniMRYarnCluster allows for changing a bulk of the directories
it uses via org.apache.hadoop.yarn.conf.YarnConfiguration, however testWorkDir
is not one of them. testWorkDir is hardcoded to use the following in
org.apache.hadoop.yarn.server.MiniYARNCluster.java
{code}
public MiniYARNCluster(String testName, int noOfNodeManagers,
int numLocalDirs, int numLogDirs) {
super(testName.replace("$", ""));
this.numLocalDirs = numLocalDirs;
this.numLogDirs = numLogDirs;
this.testWorkDir = new File("target",
testName.replace("$", ""));
....
}
{code}
If modifications to SpillRecord are undesirable, allowing testWorkDir to be
configurable might be a good workaround.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)