Copilot commented on code in PR #9162:
URL: https://github.com/apache/gravitino/pull/9162#discussion_r2559215231
##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -57,20 +82,28 @@ public ReverseIndexCache() {
GenericEntity.class,
ReverseIndexRules.GENERIC_METADATA_OBJECT_REVERSE_RULE);
}
- public boolean remove(EntityCacheKey key) {
- return reverseIndex.remove(key.toString());
- }
-
public Iterable<List<EntityCacheKey>> getValuesForKeysStartingWith(String
keyPrefix) {
return reverseIndex.getValuesForKeysStartingWith(keyPrefix);
}
- public Iterable<CharSequence> getKeysStartingWith(String keyPrefix) {
- return reverseIndex.getKeysStartingWith(keyPrefix);
- }
+ public boolean remove(EntityCacheKey key) {
+ List<EntityCacheKey> relatedKeys = entityToReverseIndexMap.remove(key);
+ if (CollectionUtils.isNotEmpty(relatedKeys)) {
+ for (EntityCacheKey relatedKey : relatedKeys) {
+ List<EntityCacheKey> existingKeys =
reverseIndex.getValueForExactKey(relatedKey.toString());
+ if (existingKeys != null && existingKeys.contains((key))) {
Review Comment:
Extra parenthesis in the `contains()` method call. Should be
`existingKeys.contains(key)` instead of `existingKeys.contains((key))`.
```suggestion
if (existingKeys != null && existingKeys.contains(key)) {
```
##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -79,7 +112,8 @@ public int size() {
public void put(
NameIdentifier nameIdentifier, Entity.EntityType type,
EntityCacheRelationKey key) {
- EntityCacheKey entityCacheKey = EntityCacheKey.of(nameIdentifier, type);
+ EntityCacheRelationKey entityCacheKey =
EntityCacheRelationKey.of(nameIdentifier, type);
Review Comment:
Inconsistent key type usage: This method creates an `EntityCacheRelationKey`
on line 115, but the `get()` method at line 133 uses `EntityCacheKey.of()` for
the same purpose. This inconsistency could lead to subtle bugs. Both methods
should use the same key type - either both should use
`EntityCacheKey.of(nameIdentifier, type)` or both should use
`EntityCacheRelationKey.of(nameIdentifier, type)`. Since the variable is used
as a lookup key in the reverseIndex and should match what's used in `get()`, it
should be `EntityCacheKey`.
```suggestion
EntityCacheKey entityCacheKey = EntityCacheKey.of(nameIdentifier, type);
```
##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -45,6 +47,29 @@ public class ReverseIndexCache {
/** Registers a reverse index processor for a specific entity class. */
private final Map<Class<? extends Entity>, ReverseIndexRule>
reverseIndexRules = new HashMap<>();
+ /**
+ * Map from data entity key to a list of entity cache relation keys. This is
used for reverse
+ * indexing.
+ *
+ * <p>For example, a role entity may be related to multiple securable
objects, so we need to
+ * maintain a mapping from the role entity key to the list of securable
object keys. that is
+ * dataToReverseIndexMap: roleEntityKey -> [securableObjectKey1,
securableObjectKey2, ...]
+ *
+ * <p>This map is used to quickly find all the related entity cache keys
when we need to
+ * invalidate in the reverse index if a role entity is updated. The
following is an example: a
+ * Role a has securable objects s1 and s2, so we have the following mapping:
<br>
+ * cacheData: role1 -> role entity </br> <br>
+ * reverseIndex: s1 -> [role1], s2 -> [role1] </br>
+ *
+ * <p>This map will be: <br>
+ * role1 -> [s1, s2] </br>
+ *
+ * <p>When we update role1, we need to invalidate s1 and s2 from the reverse
index, or the data
+ * will be in the memory forever. However, the main branch before this PR
does not support this
+ * operation directly as we do not maintain such a map.
+ */
+ private Map<EntityCacheKey, List<EntityCacheKey>> entityToReverseIndexMap =
Maps.newHashMap();
Review Comment:
Thread-safety issue: `entityToReverseIndexMap` is initialized as a regular
`HashMap` using `Maps.newHashMap()`, but it's accessed in concurrent methods
(`put()` and `remove()`). While the `reverseIndex` uses a `ConcurrentRadixTree`
for thread-safe operations, this new map could be accessed concurrently by
different threads operating on different lock segments in
`CaffeineEntityCache`. This should be changed to a thread-safe map like
`ConcurrentHashMap` to prevent potential race conditions and data corruption.
Use `Maps.newConcurrentMap()` or `new ConcurrentHashMap<>()` instead.
```suggestion
private Map<EntityCacheKey, List<EntityCacheKey>> entityToReverseIndexMap
= Maps.newConcurrentMap();
```
##########
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java:
##########
@@ -45,6 +47,29 @@ public class ReverseIndexCache {
/** Registers a reverse index processor for a specific entity class. */
private final Map<Class<? extends Entity>, ReverseIndexRule>
reverseIndexRules = new HashMap<>();
+ /**
+ * Map from data entity key to a list of entity cache relation keys. This is
used for reverse
+ * indexing.
+ *
+ * <p>For example, a role entity may be related to multiple securable
objects, so we need to
+ * maintain a mapping from the role entity key to the list of securable
object keys. that is
+ * dataToReverseIndexMap: roleEntityKey -> [securableObjectKey1,
securableObjectKey2, ...]
Review Comment:
Documentation inconsistency: The comment on line 56 refers to
"dataToReverseIndexMap" but the actual field name on line 71 is
"entityToReverseIndexMap". The documentation should use the correct field name
to avoid confusion.
```suggestion
* entityToReverseIndexMap: roleEntityKey -> [securableObjectKey1,
securableObjectKey2, ...]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]