[
https://issues.apache.org/jira/browse/HDDS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Rose updated HDDS-1765:
-----------------------------
Target Version/s: 1.3.0 (was: 1.2.0)
I am managing the 1.2.0 release and we currently have more than 600 issues
targeted for 1.2.0. I am moving the target field to 1.3.0.
If you are actively working on this jira and believe this should be targeted
for the 1.2.0 release, Please reach out to me via Apache email or Slack.
> destroyPipeline scheduled from finalizeAndDestroyPipeline fails for short
> dead node interval
> --------------------------------------------------------------------------------------------
>
> Key: HDDS-1765
> URL: https://issues.apache.org/jira/browse/HDDS-1765
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Supratim Deka
> Priority: Minor
> Labels: MiniOzoneChaosCluster, Triaged
>
> This happens when
> OZONE_SCM_PIPELINE_DESTROY_TIMEOUT exceeds the value of
> OZONE_SCM_DEADNODE_INTERVAL. This is the case for start-chaos.sh
> When a Datanode is shutdown, SCM Stale node handler calls
> finalizeAndDestroyPipeline() which schedules destroyPipeline() operation with
> a delay
> of OZONE_SCM_PIPELINE_DESTROY_TIMEOUT. By the time this gets scheduled, dead
> node handler would have destroyed the pipeline.
>
> {code:java}
> 2019-07-05 14:45:16,358 INFO pipeline.SCMPipelineManager
> (SCMPipelineManager.java:finalizeAndDestroyPipeline(307)) - destroying
> pipeline:Pipeline[ Id: ef60537a-0a82-4fea-a574-109c881fa140, Nodes:
> 7947bf32-faaa-4b34-bf1e-2752a929938c{ip: 192.168.1.6, host: 192.168.1.6,
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE,
> State:CLOSED]
> 2019-07-05 14:45:16,363 INFO pipeline.PipelineStateManager
> (PipelineStateManager.java:removePipeline(108)) - Pipeline Pipeline[ Id:
> ef60537a-0a82-4fea-a574-109c881fa140, Nodes:
> 7947bf32-faaa-4b34-bf1e-2752a929938c{ip: 192.168.1.6, host: 192.168.1.6,
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE,
> State:CLOSED] removed from db
> ...
> 2019-07-05 14:46:12,400 WARN pipeline.RatisPipelineUtils
> (RatisPipelineUtils.java:destroyPipeline(66)) - Pipeline destroy failed for
> pipeline=PipelineID=ef60537a-0a82-4fea-a574-109c881fa140
> dn=7947bf32-faaa-4b34-bf1e-2752a929938c\{ip: 192.168.1.6, host: 192.168.1.6,
> networkLocation: /default-rack, certSerialId: null}
> 2019-07-05 14:46:12,401 ERROR pipeline.SCMPipelineManager
> (Scheduler.java:lambda$schedule$1(70)) - Destroy pipeline failed for
> pipeline:Pipeline[ Id: ef60537a-0a82-4fea-a574-109c881fa140, Nodes:
> 7947bf32-faaa-4b34-bf1e-2752a929938c\{ip: 192.168.1.6, host: 192.168.1.6,
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE,
> State:OPEN]
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException:
> PipelineID=ef60537a-0a82-4fea-a574-109c881fa140 not found
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:132)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:322)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:107)
> at
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:401)
> at
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.destroyPipeline(SCMPipelineManager.java:387)
> at
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.lambda$finalizeAndDestroyPipeline$0(SCMPipelineManager.java:321)
> at
> org.apache.hadoop.utils.Scheduler.lambda$schedule$1(Scheduler.java:68)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]