[
https://issues.apache.org/jira/browse/MAPREDUCE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726660#comment-14726660
]
Vlad Sharanhovich commented on MAPREDUCE-6363:
----------------------------------------------
So, theoretically task ID on a map task can be used as it will be stable due to
the fact that control files can not be merged by Hadoop and it is guaranteed
that the number of map tasks will be at least the same (or equal at this case)
as the number of control input files. Said that I'm strongly advise against it
due to the fact that that you rely on inability of Hadoop framework to merge
inputs, which is a big issue and as I remember there is a Jira that tracks
implementation of rack-aware inputs merge. My point is that while the task ID
will work right now (and maybe even for any feature version), there is no
guarantee that at some point Hadoop would not change its behaviour.
On another hand generating and passing a unique ID without control files as I
have proposed in my implementation is a "natural" way as we just use value,
which is already there but for some reason was set to 0 and not used before. We
have one central functions that is responsible for unique number generation and
each downstream task is guaranteed to use one unique IDs no matter what. I
truly don't see any reason why to parse a task name and convert it into a task
ID if each mapper get the value as input anyway.The IDs don't even need to ge
sequential, jut unique!
One more remark - the new reduce code also collects statistics from all
mappers/reducers, before it was broken.
And another remark - I had lots of trouble figuring out why NNBench does not
work, so I've expanded error reporting too.
Said all that, the version of the code that I have uploaded is successfully
running of daily basis in our lab cluster and produces stable results. Meaning
that it is fully tested in production environment, and I highly recommend to
use this approach.
> [NNBench] Lease mismatch error when running with multiple mappers
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-6363
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6363
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: benchmarks
> Reporter: Brahma Reddy Battula
> Assignee: Vlad Sharanhovich
> Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HDFS4929.patch, MAPREDUCE-6363-001.patch,
> MAPREDUCE-6363-002.patch, MAPREDUCE-6363-003.patch, nnbench.log
>
>
> Command :
> ./yarn jar
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar
> nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456
> -bytesToWrite 1024000000 -baseDir /benchmarks/NNBench`hostname -s`
> -replicationFactorPerFile 3 -maps 100 -reduces 10
> Trace :
> 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
> 192.168.105.214:36320: error:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)