[jira] [Commented] (MAPREDUCE-6363) [NNBench] Lease mismatch error when running with multiple mappers

Vlad Sharanhovich (JIRA) Tue, 01 Sep 2015 20:05:02 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726660#comment-14726660
 ]


Vlad Sharanhovich commented on MAPREDUCE-6363:
----------------------------------------------

So, theoretically task ID on a map task can be used as it will be stable due to 
the fact that control files can not be merged by Hadoop and it is guaranteed 
that the number of map tasks will be at least the same (or equal at this case) 
as the number of control input files. Said that I'm strongly advise against it 
due to the fact that that you rely on inability of Hadoop framework to merge 
inputs, which is a big issue and as I remember there is a Jira that tracks 
implementation of rack-aware inputs merge. My point is that while the task ID 
will work right now (and maybe even for any feature version), there is no 
guarantee that at some point Hadoop would not change its behaviour.

On another hand generating and passing a unique ID without control files as I 
have proposed in my implementation is a "natural" way as we just use value, 
which is already there but for some reason was set to 0 and not used before. We 
have one central functions that is responsible for unique number generation and 
each downstream task is guaranteed to use one unique IDs no matter what. I 
truly don't see any reason why to parse a task name and convert it into a task 
ID if each mapper get the value as input anyway.The IDs don't even need to ge 
sequential, jut unique!

One more remark - the new reduce code also collects statistics from all 
mappers/reducers, before it was broken.
And another remark - I had lots of trouble figuring out why NNBench does not 
work, so I've expanded error reporting too.

Said all that, the version of the code that I have uploaded is successfully 
running of daily basis in our lab cluster and produces stable results. Meaning 
that it is fully tested in production environment, and I highly recommend to 
use this approach.

> [NNBench] Lease mismatch error when running with multiple mappers
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-6363
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6363
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks
>            Reporter: Brahma Reddy Battula
>            Assignee: Vlad Sharanhovich
>            Priority: Critical
>             Fix For: 2.8.0
>
>         Attachments: HDFS4929.patch, MAPREDUCE-6363-001.patch, 
> MAPREDUCE-6363-002.patch, MAPREDUCE-6363-003.patch, nnbench.log
>
>
> Command :
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar 
> nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 
> -bytesToWrite 1024000000 -baseDir /benchmarks/NNBench`hostname -s` 
> -replicationFactorPerFile 3 -maps 100 -reduces 10
> Trace :
> 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
> 192.168.105.214:36320: error: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed 
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed 
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6363) [NNBench] Lease mismatch error when running with multiple mappers

Reply via email to