[
https://issues.apache.org/jira/browse/HDFS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076332#comment-14076332
]
Hari Krishna Dara commented on HDFS-4929:
-----------------------------------------
I faced this exact same issue and my workaround is to modify NNBench to encode
the counter as a sequence and use that counter value as part of the filename
(instead of the hostname). This gives uniqueness to the filenames and avoids
multiple maps running on the same node from stepping on each other. However,
this requires that the subsequent operations (read, rename and delete) are run
with the exact same number of maps.
In fact, the way NNBench works is very flawed. You are expected to first run
"create_write" and then follow up with the other operations. Since the hostname
is in the filename, this only works when the maps for follow up operations are
guaranteed to be run on exactly the same nodes, but there is no such guarantee.
There is also an issue with the maps getting swarmed on a small number of
nodes, instead of getting distributed across all nodes nicely. In my testing,
when I ran with 13 maps on a cluster with 13 slaves, I had all the maps
allocated on two nodes only. When I want to generate a lot of load, I want to
increase the number of maps, but the way it currently works, these maps are
very likely to swarm up on a few nodes and when they exceed the node capacity,
just queue up and fail the barrier condition. I don't know the right solution,
but I worked around for my use case by setting the replication count of the
control files to the same as the number of nodes so that they have an equal
chance getting assigned to any node. However, this only makes a sense when the
number of maps is high compared to the capacity of the cluster.
> [NNBench mark] Lease mismatch error when running with multiple mappers
> ----------------------------------------------------------------------
>
> Key: HDFS-4929
> URL: https://issues.apache.org/jira/browse/HDFS-4929
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: benchmarks
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
>
> Command :
> ./yarn jar
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar
> nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456
> -bytesToWrite 1024000000 -baseDir /benchmarks/NNBench`hostname -s`
> -replicationFactorPerFile 3 -maps 100 -reduces 10
> Trace :
> 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
> 192.168.105.214:36320: error:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch
> on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by
> DFSClient_attempt_1371782327901_0001_m_000048_0_1383437860_1 but is accessed
> by DFSClient_attempt_1371782327901_0001_m_000084_0_1880545303_1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
--
This message was sent by Atlassian JIRA
(v6.2#6252)