[ 
https://issues.apache.org/jira/browse/HDDS-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswin Shakil reassigned HDDS-8520:
----------------------------------

    Assignee: Swaminathan Balachandran

> [snapshot] OM process crash when trying to access contents of deleted 
> snapshot through fs api
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDDS-8520
>                 URL: https://issues.apache.org/jira/browse/HDDS-8520
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Jyotirmoy Sinha
>            Assignee: Swaminathan Balachandran
>            Priority: Major
>              Labels: ozone-snapshot
>
> Steps :
>  # Create volume, bucket, key and create snapshot snap1
>  # Delete snapshot snap1
>  # Try to access contents of deleted snapshot snap1 through 'fs -ls'
> OM error stacktrace -
> {code:java}
> 2023-05-03 06:47:09,555 [Socket Reader #1 for port 9862] INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for [email protected] (auth:KERBEROS) for 
> protocol=interface org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol
> 2023-05-03 06:47:11,287 [OM StateMachine ApplyTransaction Thread - 0] ERROR 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with 
> exit status 1: Request cmdType: PurgeDirectories
> clientId: "client-4D5F4A3C07A4"
> purgeDirectoriesRequest {
>   snapshotTableKey: "/vol2/buck1/snap1"
> }
> failed with exception
> java.lang.IllegalStateException: java.io.IOException: FILE_NOT_FOUND 
> org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. 
> Snapshot with table key '/vol2/buck1/snap1' is no longer active
>         at 
> org.apache.hadoop.ozone.om.request.key.OMDirectoriesPurgeRequestWithFSO.validateAndUpdateCache(OMDirectoriesPurgeRequestWithFSO.java:133)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:337)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:567)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:358)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: FILE_NOT_FOUND 
> org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. 
> Snapshot with table key '/vol2/buck1/snap1' is no longer active
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:523)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMDirectoriesPurgeRequestWithFSO.validateAndUpdateCache(OMDirectoriesPurgeRequestWithFSO.java:77)
>         ... 7 more
> Caused by: FILE_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: 
> Unable to load snapshot. Snapshot with table key '/vol2/buck1/snap1' is no 
> longer active
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:288)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
>         at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
>         at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
>         at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
>         at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3966)
>         at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3989)
>         at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4950)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager.checkForSnapshot(OmSnapshotManager.java:521)
>         ... 8 more
> 2023-05-03 06:47:11,434 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: ==> 
> JVMShutdownHook.run()
> 2023-05-03 06:47:11,449 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: JVMShutdownHook: 
> Signalling async audit cleanup to start.
> 2023-05-03 06:47:11,455 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: JVMShutdownHook: 
> Waiting up to 30 seconds for audit cleanup to finish.
> 2023-05-03 06:47:11,459 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: 
> RangerAsyncAuditCleanup: Starting cleanup
> 2023-05-03 06:47:11,472 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Stop called. name=ozone.async
> 2023-05-03 06:47:11,472 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Interrupting consumerThread. 
> name=ozone.async, consumer=ozone.async.summary
> 2023-05-03 06:47:11,473 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: 
> RangerAsyncAuditCleanup: Done cleanup
> 2023-05-03 06:47:11,473 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: 
> RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
> 2023-05-03 06:47:11,473 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Caught exception in consumer 
> thread. Shutdown might be in progress
> 2023-05-03 06:47:11,474 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Exiting polling loop. 
> name=ozone.async
> 2023-05-03 06:47:11,474 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: JVMShutdownHook: Audit 
> cleanup finished after 19 milli seconds
> 2023-05-03 06:47:11,474 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Calling to stop consumer. 
> name=ozone.async, consumer.name=ozone.async.summary
> 2023-05-03 06:47:11,474 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: JVMShutdownHook: 
> Interrupting ranger async audit cleanup thread
> 2023-05-03 06:47:11,474 [shutdown-hook-0] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: <== 
> JVMShutdownHook.run()
> 2023-05-03 06:47:11,474 [Ranger async Audit cleanup] INFO 
> org.apache.ranger.audit.provider.AuditProviderFactory: 
> RangerAsyncAuditCleanup: Interrupted while waiting for audit startCleanup 
> signal!  Exiting the thread...
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
>         at 
> org.apache.ranger.audit.provider.AuditProviderFactory$RangerAsyncAuditCleanup.run(AuditProviderFactory.java:531)
>         at java.lang.Thread.run(Thread.java:748)
> 2023-05-03 06:47:11,474 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditSummaryQueue: Stop called. 
> name=ozone.async.summary
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditSummaryQueue: Interrupting consumerThread. 
> name=ozone.async.summary, consumer=ozone.async.summary.batch
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditAsyncQueue0] INFO 
> org.apache.ranger.audit.queue.AuditAsyncQueue: Exiting consumerThread.run() 
> method. name=ozone.async
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditSummaryQueue: Caught exception in 
> consumer thread. Shutdown might be in progress
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditSummaryQueue: Exiting polling loop. 
> name=ozone.async.summary
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditSummaryQueue: Calling to stop 
> consumer. name=ozone.async.summary, consumer.name=ozone.async.summary.batch
> 2023-05-03 06:47:11,481 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditBatchQueue: Stop called. 
> name=ozone.async.summary.batch
> 2023-05-03 06:47:11,486 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditBatchQueue: Interrupting 
> consumerThread. name=ozone.async.summary.batch, 
> consumer=ozone.async.summary.batch.solr
> 2023-05-03 06:47:11,486 [org.apache.ranger.audit.queue.AuditSummaryQueue0] 
> INFO org.apache.ranger.audit.queue.AuditSummaryQueue: Exiting 
> consumerThread.run() method. name=ozone.async.summary
> 2023-05-03 06:47:11,491 [shutdown-hook-0] INFO 
> org.apache.hadoop.ozone.om.OzoneManager: 
> om1[jsinha-1.jsinha.root.hwx.site:9862]: Stopping Ozone Manager
> 2023-05-03 06:47:11,492 [org.apache.ranger.audit.queue.AuditBatchQueue0] 
> ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient: Request to 
> collection [ranger_audits] failed due to (0) java.lang.InterruptedException, 
> retry=0 commError=false errorCode=0
> 2023-05-03 06:47:11,492 [org.apache.ranger.audit.queue.AuditBatchQueue0] INFO 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient: request was not 
> communication error it seems
> 2023-05-03 06:47:11,507 [shutdown-hook-0] INFO 
> org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at 
> jsinha-1.jsinha.root.hwx.site/172.27.88.82
> ************************************************************/ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to