István Fajth created HDDS-11472:
-----------------------------------
Summary: Multiple IOzoneAuthorizer instances may be created at
install snapshot failure
Key: HDDS-11472
URL: https://issues.apache.org/jira/browse/HDDS-11472
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Manager
Reporter: István Fajth
If during installing a Ratis snapshot there is a failure after the metadata
manager has been stopped, then as part of the flow, we are calling
OzoneManager#reloadOMState() method from OzoneManager#installCheckpoint.
As part of the reloadOmState call, we re-create the IAccessAuthorizer instance
with the help of OzoneAuthorizerFactory, but the old instance may remain
running as even if we replace the reference in OzoneManager, there might be
other places from where the old object is still referenced.
In the case when the authorizer object refers to a significant amount of data,
then a repating failure like the one described in HDDS-10300 can fill up the
heap of Ozone Manager. Internally we ran into this with and Atlas+Ranger+Ozone
setup, where the plugin refers to ~2GB worth of objects in the heap, and was
present multiple times in a heap dump. This condition lead to a crash due to
long GC pauses.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]