Shane Kumpf created MAPREDUCE-5740:
--------------------------------------

             Summary: Shuffle error when the MiniMRYARNCluster work path 
contains special characters
                 Key: MAPREDUCE-5740
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5740
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: Shane Kumpf
            Priority: Minor


When running tests that leverage MiniMRYARNCluster a failure occurs during the 
jenkins build, however, the tests are successful on local workstations.

The exception found is as follows: 
{quote}
2014-01-30 10:59:28,649 ERROR [ShuffleHandler.java:510] Shuffle error :
java.io.IOException: Error Reading IndexFile
        at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
        at 
org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
        at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:592)
        at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
        at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
        at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
        at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
        at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
        at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
        at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
        at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: 
/home/sitebuild/jenkins/workspace/%7Binventory-engineering%7D-snapshot-workflow-%7BS7274%7D/target/Integration-Tests/Integration-Tests-localDir-nm-0_2/usercache/sitebuild/appcache/application_1391108343099_0001/output/attempt_1391108343099_0001_m_000000_0/file.out.index
        at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:210)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
        at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
        at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
        ... 32 more
{quote}


It was found that org.apache.hadoop.mapred.SpillRecord does a toURI on the 
indexFileName Path object (line 71). Jenkins uses {} to denote team and branch. 
These {} characters are being URL encoded, which causes the 
FileNotFoundException during the shuffle phase.

Interestingly, the code snippet is as follows and seems a little strange to be 
doing the Path.toUri() so high up in the call:

{code}
public SpillRecord(Path indexFileName, JobConf job, Checksum crc, String 
expectedIndexOwner)  throws IOException {

    final FileSystem rfs = FileSystem.getLocal(job).getRaw();

    final FSDataInputStream in =

        SecureIOUtils.openFSDataInputStream(new 
File(indexFileName.toUri().getRawPath()), expectedIndexOwner, null);

....

}
{code}

and SecureIOUtils creates a Path from the File object (!):

{code}
public static FSDataInputStream openFSDataInputStream(File file,

      String expectedOwner, String expectedGroup) throws IOException {

    if (!UserGroupInformation.isSecurityEnabled()) {

      return rawFilesystem.open('''new Path(file.getAbsolutePath())''');

    }

    return forceSecureOpenFSDataInputStream(file, expectedOwner, expectedGroup);

  }
{code}

The rawFileSystem.open(Path) code, above, is executed by the abstract class 
FileSystem that delegates to the child class at runtime, which could be any of:
        •       ChRootedFileSystem
        •       ChecksumFileSystem
        •       DistributedFileSystem
        •       FtpFileSystem
        •       WebHdfsFileSystem
        •       and others

URL escaping makes sense for the WebHdfsFileSystem and some others, but not for 
all. It seems to make sense to only URL escape within FileSystem 
implementations that require it.

Also of note: MiniMRYarnCluster allows for changing a bulk of the directories 
it uses via org.apache.hadoop.yarn.conf.YarnConfiguration, however testWorkDir 
is not one of them. testWorkDir is hardcoded to use the following in 
org.apache.hadoop.yarn.server.MiniYARNCluster.java

{code}
public MiniYARNCluster(String testName, int noOfNodeManagers,
                         int numLocalDirs, int numLogDirs) {
    super(testName.replace("$", ""));
    this.numLocalDirs = numLocalDirs;
    this.numLogDirs = numLogDirs;
    this.testWorkDir = new File("target",
        testName.replace("$", ""));
....
}
{code}

If modifications to SpillRecord are undesirable, allowing testWorkDir to be 
configurable might be a good workaround.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to