[
https://issues.apache.org/jira/browse/HDDS-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai resolved HDDS-4722.
------------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
> Creating RDBStore fails due to RDBMetrics instance race
> --------------------------------------------------------
>
> Key: HDDS-4722
> URL: https://issues.apache.org/jira/browse/HDDS-4722
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 1.0.0
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> I am using Ozone APIs to create containers, and it occasionally aborts due to
> a data race in acessing the RBDMetric instance:
> {noformat}
> 2021-01-09 02:39:36,944 [pool-1-thread-4] INFO keyvalue.KeyValueContainer:
> Container 318054 is closed with bcsId 0.
> 2021-01-09 02:39:36,988 [pool-1-thread-17] ERROR freon.BaseFreonGenerator:
> Error on executing task 318048
> com.google.common.util.concurrent.UncheckedExecutionException:
> org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics
> already exists!
> at
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
> at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator.lambda$writeContainer$1(ContainerGenerator.java:489)
> at com.codahale.metrics.Timer.time(Timer.java:101)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator.writeContainer(ContainerGenerator.java:485)
> at
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:189)
> at
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:169)
> at
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:152)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source
> RDBMetrics already exists!
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at
> org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
> at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
> at
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
> at
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
> at
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
> at
> org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaTwoImpl.<init>(DatanodeStoreSchemaTwoImpl.java:48)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.createContainerMetaData(KeyValueContainerUtil.java:112)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:133)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator.createContainer(ContainerGenerator.java:463)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator.access$100(ContainerGenerator.java:109)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:357)
> at
> org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:353)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
> {noformat}
> Looking at the code, I believe RDBMetrics#unRegister() should be made
> synchronized. Otherwise create and close RDBStore objects could lead to race
> of the RDBMetrics instance object.
> After making RDBMetrics#unRegister() synchronized, the tool no longer aborts
> due to the race.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]