[
https://issues.apache.org/jira/browse/HDDS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868486#comment-17868486
]
Hemant Kumar commented on HDDS-11078:
-------------------------------------
Hi [~adoroszlai], this change is causing a delay in shut shutdown of the
process, and also shutdown hook seems broken after the change.
OM logs without the change:
{code:java}
2024-07-24 02:04:07,317 ERROR [SIGTERM
handler]-org.apache.hadoop.ozone.om.OzoneManagerStarter: RECEIVED SIGNAL 15:
SIGTERM
2024-07-24 02:04:07,320 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down OzoneManager at
ccycloud-1.iamgroot-test.root.comops.site/10.140.132.193
************************************************************/
2024-07-24 02:04:07,325 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.OzoneManager:
om38[ccycloud-1.iamgroot-test.root.comops.site:9862]: Stopping Ozone Manager
2024-07-24 02:04:07,326 INFO [shutdown-hook-0]-org.apache.hadoop.ipc.Server:
Stopping server on 9862
2024-07-24 02:04:07,339 INFO [IPC Server listener on
9862]-org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 9862
2024-07-24 02:04:07,341 INFO [IPC Server
Responder]-org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2024-07-24 02:04:09,330 INFO
[om38-KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService:
Send 2 key(s) to SCM, first 2 keys:
[BlockGroup[groupID='/s3v/cloudera-health-monitoring-ozone-basic-canary-bucket/cloudera-health-monitoring-ozone-basic-canary-key/-9223372036852065535',
blockIDs=[conID: 1002 locID: 113750153625603001 bcsId: 0 replicaIndex: null]],
BlockGroup[groupID='/s3v/cloudera-health-monitoring-ozone-basic-canary-bucket/cloudera-health-monitoring-ozone-basic-canary-key/-9223372036852068607',
blockIDs=[conID: 1003 locID: 113750153625602188 bcsId: 0 replicaIndex: null]]]
2024-07-24 02:04:09,440 INFO
[om38-KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: 2
BlockGroup deletion are acked by SCM in 111 ms
2024-07-24 02:04:09,471 INFO
[om38-KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService:
Blocks for 2 (out of 2) keys are deleted from DB in 30 ms
2024-07-24 02:04:10,891 WARN
[grpc-default-executor-0]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om37-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
RST_STREAM closed stream. HTTP/2 error code: CANCEL
2024-07-24 02:04:10,892 WARN
[grpc-default-executor-5]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om37-GrpcLogAppender: Follower failed (request=null,
errorCount=1); keep nextIndex (10597) unchanged and retry.
2024-07-24 02:04:11,407 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.GrpcOzoneManagerServer: Server
GrpcOzoneManagerServer is shutdown
2024-07-24 02:04:11,408 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer:
Stopping org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer@7b1335ca at
port 9872
2024-07-24 02:04:11,409 INFO
[shutdown-hook-0]-org.apache.ratis.server.RaftServer: om38: close
2024-07-24 02:04:11,411 INFO
[om38-impl-thread2]-org.apache.ratis.server.RaftServer$Division:
om38@group-B7D7B4B185E7: shutdown
2024-07-24 02:04:11,411 INFO
[shutdown-hook-0]-org.apache.ratis.grpc.server.GrpcService: om38: shutdown
server GrpcServerProtocolService now
2024-07-24 02:04:11,412 INFO
[om38-impl-thread2]-org.apache.ratis.util.JmxRegister: Successfully
un-registered JMX Bean with object name
Ratis:service=RaftServer,group=group-B7D7B4B185E7,id=om38
2024-07-24 02:04:11,412 INFO
[om38-impl-thread2]-org.apache.ratis.server.impl.RoleInfo: om38: shutdown
om38@group-B7D7B4B185E7-LeaderStateImpl
2024-07-24 02:04:11,414 WARN
[om38@group-B7D7B4B185E7->om36-GrpcLogAppender-LogAppenderDaemon]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om36-GrpcLogAppender: Wait interrupted by
java.lang.InterruptedException
2024-07-24 02:04:11,414 WARN
[om38@group-B7D7B4B185E7->om37-GrpcLogAppender-LogAppenderDaemon]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om37-GrpcLogAppender: Wait interrupted by
java.lang.InterruptedException
2024-07-24 02:04:11,414 INFO
[om38-impl-thread2]-org.apache.ratis.server.impl.PendingRequests:
om38@group-B7D7B4B185E7-PendingRequests: sendNotLeaderResponses
2024-07-24 02:04:11,421 INFO
[grpc-default-executor-5]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om36-AppendLogResponseHandler: follower responses
appendEntries COMPLETED
2024-07-24 02:04:11,421 INFO
[grpc-default-executor-0]-org.apache.ratis.grpc.server.GrpcLogAppender:
om38@group-B7D7B4B185E7->om36-AppendLogResponseHandler: follower responses
appendEntries COMPLETED
2024-07-24 02:04:11,424 INFO
[om38-impl-thread2]-org.apache.ratis.server.impl.StateMachineUpdater:
om38@group-B7D7B4B185E7-StateMachineUpdater: set stopIndex = 10596
2024-07-24 02:04:11,424 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
applied = (t:5, i:10596)
2024-07-24 02:04:11,424 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
skipped = 10595
2024-07-24 02:04:11,424 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
notified = (t:5, i:10596)
2024-07-24 02:04:11,424 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
snapshot = (t:5, i:10596)
2024-07-24 02:04:11,425 INFO
[Thread-185]-org.apache.ratis.grpc.server.GrpcServerProtocolClient: om36 Close
channels
2024-07-24 02:04:11,425 INFO
[Thread-186]-org.apache.ratis.grpc.server.GrpcServerProtocolClient: om37 Close
channels
2024-07-24 02:04:11,432 INFO
[shutdown-hook-0]-org.apache.ratis.grpc.server.GrpcService: om38: shutdown
server GrpcServerProtocolService successfully
2024-07-24 02:04:11,458 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
om38@group-B7D7B4B185E7-StateMachineUpdater: Took a snapshot at index 10596
2024-07-24 02:04:11,459 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
om38@group-B7D7B4B185E7-StateMachineUpdater: snapshotIndex: updateIncreasingly
10584 -> 10596
2024-07-24 02:04:11,459 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
om38@group-B7D7B4B185E7-StateMachineUpdater: closing OzoneManagerStateMachine,
lastApplied=(t:5, i:10596)
2024-07-24 02:04:11,459 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
Stopping OzoneManagerStateMachine:om38:group-B7D7B4B185E7.
2024-07-24 02:04:11,459 INFO
[om38@group-B7D7B4B185E7-StateMachineUpdater]-org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer:
Stopping OMDoubleBuffer flush thread
2024-07-24 02:04:11,459 INFO
[om38-OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer:
OMDoubleBuffer flush thread om38-OMDoubleBufferFlushThread is interrupted and
will exit.
2024-07-24 02:04:11,462 INFO
[om38@group-B7D7B4B185E7-cacheEviction-AwaitToRun]-org.apache.ratis.util.AwaitToRun:
om38@group-B7D7B4B185E7-cacheEviction-AwaitToRun-AwaitForSignal is interrupted
2024-07-24 02:04:11,473 INFO
[om38-impl-thread2]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
om38@group-B7D7B4B185E7-SegmentedRaftLogWorker close()
2024-07-24 02:04:11,489 INFO
[JvmPauseMonitor0]-org.apache.ratis.util.JvmPauseMonitor: JvmPauseMonitor-om38:
Stopped
2024-07-24 02:04:11,489 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service KeyDeletingService
2024-07-24 02:04:11,490 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service DirectoryDeletingService
2024-07-24 02:04:11,490 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service OpenKeyCleanupService
2024-07-24 02:04:11,490 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service SstFilteringService
2024-07-24 02:04:11,491 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service SnapshotDeletingService
2024-07-24 02:04:11,491 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service MultipartUploadCleanupService
2024-07-24 02:04:11,492 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service SnapshotDirectoryCleaningService
2024-07-24 02:04:11,514 INFO
[shutdown-hook-0]-org.eclipse.jetty.server.handler.ContextHandler: Stopped
o.e.j.w.WebAppContext@738a815c{ozoneManager,/,null,STOPPED}{jar:file:/opt/cloudera/parcels/OZONE-719.3.0-1.ozone719.3.0.p0.55731689/lib/hadoop-ozone/share/ozone/lib/ozone-manager-1.5.0.719.3.0-b15.jar!/webapps/ozoneManager}
2024-07-24 02:04:11,518 INFO
[shutdown-hook-0]-org.eclipse.jetty.server.AbstractConnector: Stopped
ServerConnector@6b3d9c38{HTTP/1.1, (http/1.1)}{0.0.0.0:9874}
2024-07-24 02:04:11,519 INFO
[shutdown-hook-0]-org.eclipse.jetty.server.session: node0 Stopped scavenging
2024-07-24 02:04:11,520 INFO
[shutdown-hook-0]-org.eclipse.jetty.server.handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@7f642bf{static,/static,jar:file:/opt/cloudera/parcels/OZONE-719.3.0-1.ozone719.3.0.p0.55731689/lib/hadoop-ozone/share/ozone/lib/ozone-manager-1.5.0.719.3.0-b15.jar!/webapps/static,STOPPED}
2024-07-24 02:04:11,520 INFO
[shutdown-hook-0]-org.eclipse.jetty.server.handler.ContextHandler: Stopped
o.e.j.s.ServletContextHandler@ad0bb4e{logs,/logs,file:///var/log/hadoop-ozone/,STOPPED}
2024-07-24 02:04:11,523 INFO
[shutdown-hook-0]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Shutting
down CompactionDagPruningService.
2024-07-24 02:04:11,533 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager:
Shutting down executorService: 'SnapDiffExecutor'
2024-07-24 02:04:11,534 INFO
[shutdown-hook-0]-org.apache.hadoop.hdds.utils.BackgroundService: Shutting down
service SnapshotDiffCleanupService{code}
With the change:
{code:java}
2024-07-24 01:50:45,343 ERROR [SIGTERM
handler]-org.apache.hadoop.ozone.om.OzoneManagerStarter: RECEIVED SIGNAL 15:
SIGTERM
2024-07-24 01:53:07,722 ERROR [SIGTERM
handler]-org.apache.hadoop.ozone.om.OzoneManagerStarter: RECEIVED SIGNAL 15:
SIGTERM
2024-07-24 02:02:41,903 INFO
[main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting OzoneManager
STARTUP_MSG: host = ccycloud-1.iamgroot-test.root.comops.site/10.140.132.193
STARTUP_MSG: args = [--init]
STARTUP_MSG: version = 1.5.0.719.3.0-b15 {code}
One more example:
{code:java}
2024-07-19 00:13:57,383 ERROR [SIGTERM
handler]-org.apache.hadoop.ozone.om.OzoneManagerStarter: RECEIVED SIGNAL 15:
SIGTERM
2024-07-19 00:14:21,872 WARN
[grpc-default-executor-1]-org.apache.ratis.grpc.server.GrpcServerProtocolService:
om134: APPEND_ENTRIES onError, lastRequest: null:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client
cancelled
2024-07-19 00:14:21,879 WARN
[grpc-default-executor-3]-org.apache.ratis.grpc.server.GrpcServerProtocolService:
om134: APPEND_ENTRIES onError, lastRequest: om135->om134#1-t1,previous=(t:0,
i:0),leaderCommit=-1,initializing? false,entries: size=1, first=(t:1, i:0),
CONFIGURATIONENTRY(current:id:"om136"address:"ccycloud-4.quasar-oxbqig.root.comops.site:9872"startupRole:FOLLOWER,
id:"om134"address:"ccycloud-1.quasar-oxbqig.root.comops.site:9872"startupRole:FOLLOWER,
id:"om135"address:"ccycloud-5.quasar-oxbqig.root.comops.site:9872"startupRole:FOLLOWER,
old:): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED:
client cancelled{code}
The same goes for other services SCM, Datanode, etc.
> Remove usage of sun.misc.Signal
> -------------------------------
>
> Key: HDDS-11078
> URL: https://issues.apache.org/jira/browse/HDDS-11078
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.0
>
>
> {code}
> sun.misc.Signal is internal proprietary API and may be removed in a future
> release
> {code}
> The goal of this task is to replace usage of {{sun.misc.Signal}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]