[ 
https://issues.apache.org/jira/browse/HDDS-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801772#comment-17801772
 ] 

Devesh Kumar Singh edited comment on HDDS-10017 at 1/2/24 12:55 PM:
--------------------------------------------------------------------

[~conway] Based on above exception trace, "{*}scm.snapshot.db{*}{*}"{*} 
snapshot file gets created when Recon was down while SCM and all DNs were up 
and Recon restarted after a while which can create difference of containers 
between SCM and Recon. If the difference between SCM container count and Recon 
container count > 100, then Recon pulls the SCM metadata db and keep a copy at 
Recon node. But the file will be renamed later as {*}{{*}}recon-scm.db , \{*}so 
with happy and positive flow I am not able to reproduce the issue. Exception in 
above logs has come out of PipelineSyncTask when it tries to update the 
pipeline state in pipeline RDB table with reference of new RocksDB store 
snapshot file  "{*}scm.snapshot.db..*".

Can you check if 
"{*}/ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log{*}" 
exists in your cluster ? If not try restart your cluster, ideally this file 
will not be created by Recon on its own.

 


was (Author: JIRAUSER295411):
[~conway] Based on above exception trace, "{*}scm.snapshot.db{*}{*}"{*} 
snapshot file gets created when Recon was down while SCM and all DNs were up 
and Recon restarted after a while which can create difference of containers 
between SCM and Recon. If the difference between SCM container count and Recon 
container count > 100, then Recon pulls the SCM metadata db and keep a copy at 
Recon node. But the file will be renamed later as *{*}recon-scm.db ,{*} so with 
happy and positive flow I am not able to reproduce the issue. Exception in 
above logs has come out of PipelineSyncTask when it tries to update the 
pipeline state in pipeline RDB table with reference of new RocksDB store 
snapshot file  "{*}scm.snapshot.db..*{*}".

Can you check if 
"{*}/ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log{*}" 
exists in your cluster ? If not try restart your cluster, ideally this file 
will not be created by Recon on its own.

 

> [Timer for 'Recon' metrics system] Rocks Database is closed
> -----------------------------------------------------------
>
>                 Key: HDDS-10017
>                 URL: https://issues.apache.org/jira/browse/HDDS-10017
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>    Affects Versions: 1.4.0
>            Reporter: Conway Zhang
>            Assignee: Devesh Kumar Singh
>            Priority: Major
>
> There has a error when I using Recon about Rocks Database is closed:
> {code:java}
> 2023-12-26 17:36:53,385 [PipelineSyncTask] INFO 
> org.apache.hadoop.ozone.recon.scm.ReconPipelineManager: Adding new pipeline 
> PipelineID=40552754-9300-..-3aadeaf41348 from SCM. 2023-12-26 17:36:54,556 
> [PipelineSyncTask] ERROR 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl: Pipeline 
> PipelineID=fc776cf8-43d1-494a-..-c930456905eb state update failed 2023-12-26 
> 17:36:54,568 [PipelineSyncTask] ERROR 
> org.apache.hadoop.ozone.recon.scm.PipelineSyncTask: Exception in Pipeline 
> sync Thread. org.apache.hadoop.hdds.scm.exceptions.SCMException: 
> org.apache.ratis.protocol.exceptions.StateMachineException: 
> java.io.IOException from Server peer@group-075CE2E08D2E: 
> RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: 
> Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While 
> open a file for appending: 
> /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such 
> file or directory at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.translateException(SCMHAInvocationHandler.java:165)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:115)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:74)
>  at com.sun.proxy.$Proxy48.updatePipelineState(Unknown Source) at 
> org.apache.hadoop.ozone.recon.scm.ReconPipelineManager.initializePipelines(ReconPipelineManager.java:114)
>  at 
> org.apache.hadoop.ozone.recon.scm.PipelineSyncTask.triggerPipelineSyncTask(PipelineSyncTask.java:92)
>  at 
> org.apache.hadoop.ozone.recon.scm.PipelineSyncTask.run(PipelineSyncTask.java:75)
>  at java.lang.Thread.run(Thread.java:745) Caused by: 
> org.apache.ratis.protocol.exceptions.StateMachineException: 
> java.io.IOException from Server peer@group-075CE2E08D2E: 
> RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: 
> Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While 
> open a file for appending: 
> /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such 
> file or directory at 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.submitRequest(SCMHAManagerStub.java:199)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatisServer(SCMHAInvocationHandler.java:123)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:112)
>  ... 6 more Caused by: java.io.IOException: 
> RocksDatabase[/ozonedata/recon/metadata/scm.snapshot.db_1703562071086]: 
> Failed to put �wl�C�IJ�4�0Ei^E�; status : IOError(Undefined); message : While 
> open a file for appending: 
> /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such 
> file or directory at 
> org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:667)
>  at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:98)
>  at org.apache.hadoop.hdds.utils.db.RocksDatabase.put(RocksDatabase.java:501) 
> at org.apache.hadoop.hdds.utils.db.RDBTable.put(RDBTable.java:70) at 
> org.apache.hadoop.hdds.utils.db.TypedTable.put(TypedTable.java:156) at 
> org.apache.hadoop.hdds.scm.metadata.SCMDBTransactionBufferImpl.addToBuffer(SCMDBTransactionBufferImpl.java:36)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManagerImpl.updatePipelineState(PipelineStateManagerImpl.java:296)
>  at sun.reflect.GeneratedMethodAccessor328.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.process(SCMHAManagerStub.java:229)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAManagerStub$RatisServerStub.submitRequest(SCMHAManagerStub.java:191)
>  ... 8 more Caused by: org.rocksdb.RocksDBException: While open a file for 
> appending: 
> /ozonedata/recon/metadata/scm.snapshot.db_1703562071086/001042.log: No such 
> file or directory at org.rocksdb.RocksDB.putDirect(Native Method) at 
> org.rocksdb.RocksDB.put(RocksDB.java:981) at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.put(RocksDatabase.java:498) ... 
> 17 more 2023-12-26 17:36:57,768 [pool-51-thread-1] INFO 
> org.apache.hadoop.ozone.recon.scm.ReconStorageContainerManagerFacade: Got 
> list of containers from SCM : 128 2023-12-26 17:37:01,032 [Timer for 'Recon' 
> metrics system] ERROR org.apache.hadoop.hdds.utils.RocksDBStoreMetrics: 
> Failed to get property mem-table-flush-pending from rocksdb 
> java.io.IOException: Rocks Database is closed at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:458)
>  at 
> org.apache.hadoop.hdds.utils.db.RocksDatabase.getProperty(RocksDatabase.java:822)
>  at 
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(RocksDBStoreMetrics.java:214)
>  at 
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(RocksDBStoreMetrics.java:151)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:423)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:410)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:385)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:372)
>  at java.util.TimerThread.mainLoop(Timer.java:555) at 
> java.util.TimerThread.run(Timer.java:505){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to