[
https://issues.apache.org/jira/browse/HDDS-1451?focusedWorklogId=238789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-238789
]
ASF GitHub Bot logged work on HDDS-1451:
----------------------------------------
Author: ASF GitHub Bot
Created on: 07/May/19 20:40
Start Date: 07/May/19 20:40
Worklog Time Spent: 10m
Work Description: avijayanhwx commented on pull request #799: HDDS-1451 :
SCMBlockManager findPipeline and createPipeline are not lock protected.
URL: https://github.com/apache/hadoop/pull/799
The getPipelines() and createPipeline() already seem to have a lock in their
implementation. However, the problem described here involves a race condition
between the call to getPipelines and createPipelines in
BlockManagerImpl#allocateBlock. The fix is to add another getPipelines check
after a failed createPipeline call to get any newly created pipelines.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 238789)
Time Spent: 10m
Remaining Estimate: 0h
> SCMBlockManager findPipeline and createPipeline are not lock protected
> ----------------------------------------------------------------------
>
> Key: HDDS-1451
> URL: https://issues.apache.org/jira/browse/HDDS-1451
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Affects Versions: 0.3.0
> Reporter: Mukul Kumar Singh
> Assignee: Aravindan Vijayan
> Priority: Major
> Labels: MiniOzoneChaosCluster, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> SCM BlockManager may try to allocate pipelines in the cases when it is not
> needed. This happens because BlockManagerImpl#allocateBlock is not lock
> protected, so multiple pipelines can be allocated from it. One of the
> pipeline allocation can fail even when one of the existing pipeline already
> exists.
> {code}
> 2019-04-22 22:34:14,336 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5
> 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId:
> null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f
> d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS,
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,386 INFO impl.RoleInfo
> (RoleInfo.java:shutdownLeaderElection(134)) -
> e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection
> 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879
> 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId:
> null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e
> c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS,
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,388 INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143))
> - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState
> 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c
> 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId:
> null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1
> 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS,
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,389 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876
> a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId:
> null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595
> 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS,
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes:
> 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104,
> certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104,
> host: 192.168.0.104, certSerialId:
> null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id:
> cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes:
> 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104,
> certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104,
> host: 192.168.0.104, certSerialId:
> null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host:
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl
> (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for
> type:RATIS factor:THREE
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot
> create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes.
> at
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:122)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:57)
> at
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:148)
> at
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:190)
> at
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:172)
> at
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:82)
> at
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:7533)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-04-22 22:34:14,395 ERROR block.BlockManagerImpl
> (BlockManagerImpl.java:allocateBlock(213)) - Unable to allocate a block for
> the size: 16384, type: RATIS, factor: THREE
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]