[ 
https://issues.apache.org/jira/browse/HDDS-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-4722.
------------------------------------
    Fix Version/s: 1.1.0
       Resolution: Fixed

>  Creating RDBStore fails due to RDBMetrics instance race
> --------------------------------------------------------
>
>                 Key: HDDS-4722
>                 URL: https://issues.apache.org/jira/browse/HDDS-4722
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 1.0.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> I am using Ozone APIs to create containers, and it occasionally aborts due to 
> a data race in acessing the RBDMetric instance:
> {noformat}
> 2021-01-09 02:39:36,944 [pool-1-thread-4] INFO keyvalue.KeyValueContainer: 
> Container 318054 is closed with bcsId 0.
> 2021-01-09 02:39:36,988 [pool-1-thread-17] ERROR freon.BaseFreonGenerator: 
> Error on executing task 318048
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics 
> already exists!
>         at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
>         at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
>         at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator.lambda$writeContainer$1(ContainerGenerator.java:489)
>         at com.codahale.metrics.Timer.time(Timer.java:101)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator.writeContainer(ContainerGenerator.java:485)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:189)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:169)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:152)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
> RDBMetrics already exists!
>         at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>         at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>         at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>         at 
> org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
>         at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
>         at 
> org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
>         at 
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
>         at 
> org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
>         at 
> org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaTwoImpl.<init>(DatanodeStoreSchemaTwoImpl.java:48)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.createContainerMetaData(KeyValueContainerUtil.java:112)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:133)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator.createContainer(ContainerGenerator.java:463)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator.access$100(ContainerGenerator.java:109)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:357)
>         at 
> org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:353)
>         at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
>         at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
> {noformat}
> Looking at the code, I believe RDBMetrics#unRegister() should be made 
> synchronized. Otherwise create and close RDBStore objects could lead to race 
> of the RDBMetrics instance object.
> After making RDBMetrics#unRegister() synchronized, the tool no longer aborts 
> due to the race.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to