keith-turner commented on a change in pull request #2320:
URL: https://github.com/apache/accumulo/pull/2320#discussion_r745777393
##########
File path:
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java
##########
@@ -1403,6 +1402,23 @@ private void closeConsistencyCheck() {
// TODO check lastFlushID and lostCompactID - ACCUMULO-1290
}
+ private void compareToDataInMemory(TabletMetadata tabletMetadata) {
+ if
(!tabletMetadata.getFilesMap().equals(getDatafileManager().getDatafileSizes()))
{
+ String msg = "Data files in " + extent + " differ from in-memory data "
+ + tabletMetadata.getFilesMap() + " " +
getDatafileManager().getDatafileSizes();
+ log.error(msg);
+ throw new RuntimeException(msg);
+ }
+ }
+
+ public void compareTabletInfo(Pair<Long,TabletMetadata> tabletInfo) {
+ // if the counter didn't change, compare metadata to what is in memory
+ if (tabletInfo.getFirst() == this.getUpdateCounter()) {
+ this.compareToDataInMemory(tabletInfo.getSecond());
+ }
+ // if counter did change, don't compare metadata and try again later
+ }
Review comment:
Need to synchronize this method on the tablet lock and check if the
tablet is closed. We gathered a snapshot of all online tablets at a point in
time, it possible that some of the tablets in that snapshot are closed when we
try to inpsect them. The reason we need to syncronize on the tablet lock is
that we are inspecting multiple tablet instance vars (closed state, update
counter, and files map) and we want to consider all of the instance vars
atomically. W/o sync the closed state can change after we inspect it, the
update counter can change after we inspect it, etc.
```suggestion
public synchronized void compareTabletInfo(Pair<Long,TabletMetadata>
tabletInfo) {
if(isClosing() || isClosed()) {
return;
}
// if the counter didn't change, compare metadata to what is in memory
if (tabletInfo.getFirst() == this.getUpdateCounter()) {
this.compareToDataInMemory(tabletInfo.getSecond());
}
// if counter did change, don't compare metadata and try again later
}
```
##########
File path:
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java
##########
@@ -874,6 +877,7 @@ public void flush(long tableFlushID) {
} else {
initiateMinor = true;
}
+ updateCounter.getAndIncrement();
Review comment:
I think it would be better to place this counter in the DataFileManager
class. Also would not make it an atomic long, would just make it a long. The
variable it tightly coupled to the map dataFileSizes in DataFileManager. Any
time that map is update, the counter needs to be incremented. This map update
and counter increment should all be done in the same sync block, so there is no
need to make the counter an atomic long. Making it an atomic long gives the
impression that it can be accessed on its own when in reality it should only be
accessed in the same sync that controls access to the map.
The DataFileManager class relies on being accessed when the tablet lock is
held, so it does not really need its own synch.
So could add the counter to the DataFileManager class and updated whenever
the map is updated and also add a method to get the counter from
DataFileManager. Rely on the tablet lock for sync, there is another comment
related to this.
##########
File path:
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java
##########
@@ -791,6 +791,27 @@ public void run() {
}
}, 0, 5000, TimeUnit.MILLISECONDS);
+
ThreadPools.createGeneralScheduledExecutorService(aconf).scheduleWithFixedDelay(()
-> {
+ final SortedMap<KeyExtent,Tablet> onlineTabletsSnapshot =
onlineTablets.snapshot();
+ final SortedMap<KeyExtent,Pair<Long,TabletMetadata>>
tabletValidationInfo = new TreeMap<>();
+
+ // gather update counters and metadata for all tablets
+ for (var entry : onlineTabletsSnapshot.entrySet()) {
+ KeyExtent keyExtent = entry.getKey();
+ Long counter = entry.getValue().getUpdateCounter();
+ TabletMetadata tm = getContext().getAmple().readTablet(keyExtent,
+ TabletMetadata.ColumnType.FILES, TabletMetadata.ColumnType.LOGS,
+ TabletMetadata.ColumnType.ECOMP,
TabletMetadata.ColumnType.PREV_ROW);
Review comment:
This will read tablets one at a time from the metadata table resulting
in an RPC per tablet. Should use the readTablets ample method to read the
tablet which will use a batch scanner which can result in a lot less RPCs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]