ArafatKhan2198 commented on PR #5445:
URL: https://github.com/apache/ozone/pull/5445#issuecomment-1766631471
@Pochatkin @sadanand48 thanks for the comments I have mentioned out the
problem in a detailed manner along with the proposed fix :-
### Cause of the problem ➖
- TestSecureOzoneCluster ➖
- This test class has a method which is failing intermittently, called
as `testGetSetRevokeS3Secret` which tests out the revoke feature of Ozone S3
secrets for a certain user.
https://github.com/apache/ozone/blob/377d6a674f76805d6ed4df640a87159757ed3398/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestSecureOzoneCluster.java#L698
- The code flow for the test method and the problem encountered is going
like this ➖
1. A user accessID is created
2. It attempts to get an S3 secret for the user accessID.
3. It revokes the existing S3 secret.
4. It attempts to `set` a new S3 secret for the user, which should
fail and throw an error *`ACCESS_ID_NOT_FOUND`* because the access ID
(username) has been revoked earlier for that user. In most of the cases it
throws an error but in a few of them it does not because somehow even though
the user has been revoked it is still somehow able to access the secret key for
that user and change it.
5. The reason for that was investigated by adding logs to the method
called as S3SecretManager method `getSecret` which is responsible for checking
both the cache and the DB for the secret the user requires.
https://github.com/apache/ozone/blob/377d6a674f76805d6ed4df640a87159757ed3398/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/S3SecretManagerImpl.java#L54
6. Now when we tried to set a new s3 secret for an earlier revoked
user. It first proceeded to check the cache but it could find one and returned
null because the earlier revoke method had invalidated the user accessID and
S3Secret from the cache hearby completely removing it.➖
`S3SecretValue cacheValue = s3SecretCache.get(kerberosID);`
7. Afterwards it decided to check the DB S3SecretKeyTable for the
secret for the user and for some reason it was able to fetch one and returned
it back and was later changed without throwing an error *`ACCESS_ID_NOT_FOUND`*
like it shoul. The problem here happened was that the DB was not informed about
the changes in cache that had happened yet, hence this presented an
inconsistency between the cache and DB.
`S3SecretValue result = s3SecretStore.getSecret(kerberosID);`
8. The reason for that lag was due to the fact that the background
Flush Thread had not yet commited the changes of the Cache to the DB. This was
confirmed by the logs I added where we checked the cache and the db for the
revoked secret key for the user.
```
09:44:46,260 INFO om.S3SecretManagerImpl
(S3SecretManagerImpl.java:getSecret(59)) - Get secret for
kerberosID:om/@EXAMPLE.COM from cache:null
09:44:46,260 INFO om.S3SecretManagerImpl
(S3SecretManagerImpl.java:getSecret(66)) - Get secret for
kerberosID:om/@EXAMPLE.COM from
store:ae367331576fa164eaa43fe906014b579e2df3698ef4c744c5bdc88c8dec538f
```
### Fix for the Problem ➖
- Previously, in the earlier RocksDB cache implementation, when a secret was
revoked, the system retained the access ID record, but the respective secret
was stored as null. This null value served as an indicator for the
SecretManager that a specific access ID had been revoked, eliminating the need
to check the database for that user. However, with the implementation of a
newer cache using the Guava Google library, the entire record of the access ID
and the secret key is removed from the cache upon revocation. This change poses
a challenge for the SecretManager because it can no longer distinguish whether
a requested secret has been revoked or not.
- To address this issue, we propose a fix that enables the SecretManager to
maintain the record even after revocation. Additionally, an identifier,
**`boolean isDeleted;`**, will be introduced in the S3SecretValue. This
identifier will be set to `true` whenever a secret has been revoked, and the
value of the secret itself will be set to null. Consequently, in the future, if
we attempt to access or set a revoked secret, the cache will promptly notify us
that the secret has been revoked. This way, we can avoid unnecessary database
checks, especially when the database has not been updated with the latest data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]