keith-turner opened a new issue, #4324:
URL: https://github.com/apache/accumulo/issues/4324
Transient per tablet markers are written to the metadata table by compaction
and bulk load fate operations. At the conclusion of these fate operations
these markers are cleaned up. The cleanup only happens within the range of the
fate operation. Concurrent split operations could copy these markers outside
of the range of the fate operation causing cleanup to miss them. There are at
least two possible ways to fix this. One is to make the clean up operation scan
the whole table, this is expensive for an edge case that will rarely happen.
The other option is to periodically remove markers that have no live fate
operation. The bulk code used to clean up the entire table but was modified to
only consider the range that data was imported.
The periodic clean up could be done by the Accumuo GC process by running a
loop like the following.
```java
for(Ample.DataLevel level : Ample.DataLevel.values()){
// TODO create a cache that loads live ids from the fate store
LoadingCache<FateId, FateId> liveFateIds = ..;
// TODO create a new filter that only returns tablets that have
loaded or compacted markers
for(TabletMetadata tm :
ample.readTablets().forLevel(level).filter(new
LoadedCompactedFilter())).build()){
//important that live ids are only loaded for the first time
after being seen in the metadata table
var loadedToDelete =
tm.getLoaded().entrySet().stream().filter(e->liveFateIds.get(e.getKey())!=null).collect(Collectors.toList());
var compactedToDelete =
tm.getCompacted().stream().filter(fateId -> liveFateIds.get(fateId) !=
null).collect(Collectors.toList());
if(!loadedToDelete.isEmpty() ||
!compactedToDelete.isEmpty()) {
// TODO update table to delete markers
}
}
}
}
```
For testing this could manually insert data into fate and the metadata table
in test code and call the above code to see if it deletes or does not delete
markers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]