[ 
https://issues.apache.org/jira/browse/HDDS-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828634#comment-17828634
 ] 

Hongbing Wang commented on HDDS-10177:
--------------------------------------

I just post some logs in our cluster related this ticket, so far no other 
impact have been found.
{noformat}
2024-03-20 01:51:29,944 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.RDBSnapshotProvider: Ratis snapshot transfer is 
complete.
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: Installing checkpoint with 
OMTransactionInfo 4#6100006845
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
KeyDeletingService
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
DirectoryDeletingService
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
OpenKeyCleanupService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
SstFilteringService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
SnapshotDeletingService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
MultipartUploadCleanupService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: 
OzoneManagerStateMachine is pausing
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Stopping 
OMDoubleBuffer flush thread
2024-03-20 01:51:37,541 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ipc.Server: Stopping server on 9862
2024-03-20 01:51:37,549 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: RPC server is stopped. Spend 9 ms.
2024-03-20 01:51:37,550 [om3-InstallSnapshotThread] INFO 
org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Shutting down 
CompactionDagPruningService.
2024-03-20 01:51:39,070 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: metadataManager is stopped. Spend 1520 
ms.
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: Replaced DB with checkpoint from OM: 
om2, term: 4, index: 6100006845, time: 58 ms
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down 
executorService: 'SnapDiffExecutor'
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down 
executorService: 'SstDumpToolExecutor'
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service 
SnapshotDiffCleanupService
2024-03-20 01:51:39,131 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.helpers.OmKeyInfo: OmKeyInfo.getCodec ignorePipeline 
= true
2024-03-20 01:51:39,136 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder: Using RocksDB DBOptions from 
om.db.ini file
2024-03-20 01:51:40,822 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.db.RocksDatabase: 
ozone.om.skip.error.close.rocksdb value is: true.
2024-03-20 01:51:40,853 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: S3 Multi-Tenancy is disabled
2024-03-20 01:51:40,854 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OmSnapshotManager: Ozone filesystem snapshot feature 
is enabled.
2024-03-20 01:51:40,855 [om3-InstallSnapshotThread] WARN 
org.apache.hadoop.hdds.server.ServerUtils: ozone.om.snapshot.diff.db.dir is not 
configured. We recommend adding this setting. Falling
back to ozone.metadata.dirs instead.
2024-03-20 01:51:40,864 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.hdds.utils.NativeLibraryLoader: Loading Library: 
ozone_rocksdb_tools
2024-03-20 01:51:40,865 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down 
executorService: 'SstDumpToolExecutor'
2024-03-20 01:51:40,867 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.TrashPolicyOzone: Ozone Manager trash configuration: 
Deletion interval = 10080 minutes, Emptier interval = 1440 minutes.
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: 
OzoneManagerStateMachine is un-pausing
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: Reloaded OM state with Term: 4 and 
Index: 6100006845. Spend 1740 ms
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: Creating RPC Server
2024-03-20 01:51:41,206 [om3-InstallSnapshotThread] INFO 
org.reflections.Reflections: Reflections took 335 ms to scan 8 urls, producing 
23 keys and 661 values [using 96 cores]
2024-03-20 01:51:41,210 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class 
java.util.concurrent.LinkedBlockingQueue, queueCapacity: 20000, scheduler: 
class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2024-03-20 01:51:41,210 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ipc.Server: Listener at localhost:9862
2024-03-20 01:51:41,226 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: RPC server is re-started. Spend 356 ms.
2024-03-20 01:51:55,727 [om3-InstallSnapshotThread] INFO 
org.apache.hadoop.ozone.om.OzoneManager: Install Checkpoint is finished with 
Term: 4 and Index: 6100006845. Spend 18189 ms.
2024-03-20 01:51:55,727 [om3-InstallSnapshotThread] INFO 
org.apache.ratis.server.impl.SnapshotInstallationHandler: 
om3@group-197E298202B9: StateMachine successfully installed snapshot index 6
100006845. Reloading the StateMachine.
 {noformat}
 

> OM RPC server restarted by InstallSnapshotThread during shutdown
> ----------------------------------------------------------------
>
>                 Key: HDDS-10177
>                 URL: https://issues.apache.org/jira/browse/HDDS-10177
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Attila Doroszlai
>            Assignee: Sammi Chen
>            Priority: Major
>         Attachments: 2024-01-20T18-36-42_926-jvmRun1.dump, 
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices-output.txt, 
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.txt
>
>
> TestSnapshotBackgroundServices was successful:
> {code}
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 171.3 s -- in 
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices
> {code}
> but it timed out during post-test cluster shutdown, because it was waiting 
> indefinitely for the RPC server to stop:
> {code}
> "main" 
>    java.lang.Thread.State: WAITING
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:502)
>         at org.apache.hadoop.ipc.Server.join(Server.java:3569)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.join(OzoneManager.java:2286)
>         at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopOM(MiniOzoneClusterImpl.java:558)
>         at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.stop(MiniOzoneHAClusterImpl.java:311)
>         at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:453)
>         at 
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.shutdown(TestSnapshotBackgroundServices.java:202)
> {code}
> The problem is that {{InstallSnapshotThread}} restarted the RPC server in the 
> meantime:
> {code}
> 2024-01-20 18:37:17,649 [main] INFO  ozone.MiniOzoneHAClusterImpl 
> (MiniOzoneHAClusterImpl.java:stop(310)) - Stopping the OzoneManager omNode-3
> 2024-01-20 18:37:17,649 [main] INFO  om.OzoneManager 
> (OzoneManager.java:stop(2204)) - omNode-3[localhost:15012]: Stopping Ozone 
> Manager
> 2024-01-20 18:37:17,650 [main] INFO  ipc.Server (Server.java:stop(3523)) - 
> Stopping server on 15012
> ...
> 2024-01-20 18:37:17,913 [omNode-3-InstallSnapshotThread] INFO  ipc.Server 
> (Server.java:<init>(1287)) - Listener at localhost:15012
> 2024-01-20 18:37:17,932 [omNode-3-InstallSnapshotThread] INFO  
> om.OzoneManager (OzoneManager.java:installCheckpoint(3863)) - RPC server is 
> re-started. Spend 377 ms.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to