[
https://issues.apache.org/jira/browse/HDDS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610567#comment-16610567
]
Shashikant Banerjee commented on HDDS-420:
------------------------------------------
patch v4, fix the closePipeline code in RatisManager which will make sure in
case the destroyPipeline fails, the nodes are not removed from the Ratis
members list so as same nodes cannot be reused for new pipeline formation. It
also adds some logs necessary in the Pipeline close path. It updates the Ratis
to the latest snapshot version as well. This needs RATIS-310 to be resolved.
> putKey failing with KEY_ALLOCATION_ERROR
> ----------------------------------------
>
> Key: HDDS-420
> URL: https://issues.apache.org/jira/browse/HDDS-420
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Nilotpal Nandi
> Assignee: Shashikant Banerjee
> Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-420.000.patch, all-node-ozone-logs-1536607597.tar.gz
>
>
> Here are the commands run :
> {noformat}
> [root@ctr-e138-1518143905142-468367-01-000002 bin]# ./ozone oz -putKey
> /fs-volume/fs-bucket/nn1 -file /etc/passwd
> 2018-09-09 15:39:31,131 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> Create key failed, error:KEY_ALLOCATION_ERROR
> [root@ctr-e138-1518143905142-468367-01-000002 bin]#
> [root@ctr-e138-1518143905142-468367-01-000002 bin]# ./ozone fs -copyFromLocal
> /etc/passwd /
> 2018-09-09 15:40:16,879 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2018-09-09 15:40:23,632 [main] ERROR - Try to allocate more blocks for write
> failed, already allocated 0 blocks for this write.
> copyFromLocal: Message missing required fields: keyLocation
> [root@ctr-e138-1518143905142-468367-01-000002 bin]# ./ozone oz -putKey
> /fs-volume/fs-bucket/nn2 -file /etc/passwd
> 2018-09-09 15:44:55,912 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> Create key failed, error:KEY_ALLOCATION_ERROR{noformat}
>
> hadoop version :
> ---------------------------
> {noformat}
> [root@ctr-e138-1518143905142-468367-01-000002 bin]# ./hadoop version
> Hadoop 3.2.0-SNAPSHOT
> Source code repository git://git.apache.org/hadoop.git -r
> bf8a1750e99cfbfa76021ce51b6514c74c06f498
> Compiled by root on 2018-09-08T10:22Z
> Compiled with protoc 2.5.0
> From source with checksum c5bbb375aed8edabd89c377af83189d
> This command was run using
> /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/share/hadoop/common/hadoop-common-3.2.0-SNAPSHOT.jar{noformat}
>
> scm log :
> ---------------
> {noformat}
> 2018-09-09 15:45:00,907 INFO
> org.apache.hadoop.hdds.scm.pipelines.ratis.RatisManagerImpl: Allocating a new
> ratis pipeline of size: 3 id: pipelineId=f210716d-ba7b-4adf-91d6-da286e5fd010
> 2018-09-09 15:45:00,973 INFO org.apache.ratis.conf.ConfUtils: raft.rpc.type =
> GRPC (default)
> 2018-09-09 15:45:01,007 INFO org.apache.ratis.conf.ConfUtils:
> raft.grpc.message.size.max = 33554432 (custom)
> 2018-09-09 15:45:01,011 INFO org.apache.ratis.conf.ConfUtils:
> raft.client.rpc.retryInterval = 300 ms (default)
> 2018-09-09 15:45:01,012 INFO org.apache.ratis.conf.ConfUtils:
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-09-09 15:45:01,012 INFO org.apache.ratis.conf.ConfUtils:
> raft.client.async.scheduler-threads = 3 (default)
> 2018-09-09 15:45:01,020 INFO org.apache.ratis.conf.ConfUtils:
> raft.grpc.flow.control.window = 1MB (=1048576) (default)
> 2018-09-09 15:45:01,020 INFO org.apache.ratis.conf.ConfUtils:
> raft.grpc.message.size.max = 33554432 (custom)
> 2018-09-09 15:45:01,102 INFO org.apache.ratis.conf.ConfUtils:
> raft.client.rpc.request.timeout = 3000 ms (default)
> 2018-09-09 15:45:01,667 ERROR org.apache.hadoop.hdds.scm.XceiverClientRatis:
> Failed to reinitialize
> RaftPeer:bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9:172.27.12.96:9858 datanode:
> bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9{ip: 172.27.12.96, host:
> ctr-e138-1518143905142-468367-01-000007.hwx.site}
> org.apache.ratis.protocol.GroupMismatchException:
> bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9: The group (group-7347726F7570) of
> client-409D68EB500F does not match the group (group-2041ABBEE452) of the
> server bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:222)
> at
> org.apache.ratis.grpc.RaftGrpcUtil.tryUnwrapException(RaftGrpcUtil.java:79)
> at org.apache.ratis.grpc.RaftGrpcUtil.unwrapException(RaftGrpcUtil.java:67)
> at
> org.apache.ratis.grpc.client.RaftClientProtocolClient.blockingCall(RaftClientProtocolClient.java:127)
> at
> org.apache.ratis.grpc.client.RaftClientProtocolClient.reinitialize(RaftClientProtocolClient.java:102)
> at
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:77)
> at
> org.apache.ratis.client.impl.RaftClientImpl.sendRequest(RaftClientImpl.java:302)
> at
> org.apache.ratis.client.impl.RaftClientImpl.reinitialize(RaftClientImpl.java:216)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.reinitialize(XceiverClientRatis.java:163)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.reinitialize(XceiverClientRatis.java:133)
> at
> org.apache.hadoop.hdds.scm.XceiverClientRatis.createPipeline(XceiverClientRatis.java:97)
> at
> org.apache.hadoop.hdds.scm.pipelines.ratis.RatisManagerImpl.initializePipeline(RatisManagerImpl.java:105)
> at
> org.apache.hadoop.hdds.scm.pipelines.PipelineSelector.getReplicationPipeline(PipelineSelector.java:303)
> at
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.allocateContainer(ContainerStateManager.java:299)
> at
> org.apache.hadoop.hdds.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:289)
> at
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.preAllocateContainers(BlockManagerImpl.java:167)
> at
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:266)
> at
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6271)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: INTERNAL:
> bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9: The group (group-7347726F7570) of
> client-409D68EB500F does not match the group (group-2041ABBEE452) of the
> server bfe9c5f2-da9b-4a8f-9013-7540cbbed1c9
> at
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:222)
> at
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:203)
> at
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:132)
> at
> org.apache.ratis.shaded.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.reinitialize(AdminProtocolServiceGrpc.java:220)
> at
> org.apache.ratis.grpc.client.RaftClientProtocolClient.lambda$reinitialize$1(RaftClientProtocolClient.java:104)
> at
> org.apache.ratis.grpc.client.RaftClientProtocolClient.blockingCall(RaftClientProtocolClient.java:125)
> ... 24 more{noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]