keith-turner commented on PR #3350:
URL: https://github.com/apache/accumulo/pull/3350#issuecomment-1530329288
Experimented with changing just the RefreshTablets Repo for the FATE
operation to see what it would look like if we dropped the refresh column and
only sent RPCs.
```java
public class RefreshTablets extends ManagerRepo {
public long isReady(long tid, Manager manager) throws Exception {
// nothing to do here, always ready to run
return 0;
}
@Override
public Repo<Manager> call(long tid, Manager manager) throws Exception {
Map<Location,List<KeyExtent>> refreshesNeeded;
try (var tablets =
manager.getContext().getAmple().readTablets().forTable(bulkInfo.tableId).checkConsistency()
.fetch(ColumnType.LOADED, ColumnType.LOCATION,
ColumnType.PREV_ROW).build()) {
// Find all tablets that have a location and load markers for this
bulk load operation and
// therefore need to refresh their metadata. There may be some tablets
in the map that were
// hosted after the bulk files were set and that is ok, it just
results in an unneeded refresh
// request. There may also be tablets that had a location when the
files were set but do not
// have a location now, that is ok the next time that tablet loads
somewhere it will see the
// files.
refreshesNeeded = tablets.stream()
.filter(tabletMetadata -> tabletMetadata.getLocation() != null
&& tabletMetadata.getLoaded().containsValue(tid))
.collect(groupingBy(TabletMetadata::getLocation,
mapping(TabletMetadata::getExtent, toList())));
}
HashSet<Location> completed = new HashSet<>();
while (!refreshesNeeded.isEmpty()) {
for (Map.Entry<Location,List<KeyExtent>> entry :
refreshesNeeded.entrySet()) {
try {
// TODO use a thread pool to send RPC request
// Ask tablet server to reload the metadata for these tablets. If
the tabletserver is no
// longer serving a tablet that is ok as long is reloads the ones
that it is
// still serving. If a tablet was unloaded after the map was
created above when it
// reloads elsewhere it will see the bulk files. The tserver will
need to ensure tablets
// that had future location and are in the process of loading are
handled correctly or
// the refresh request could be missed due to race conditions in
the tablet server.
sendSyncRefreshRequest(entry.getKey(), entry.getValue());
// refresh request returned successfully, so mark the location as
completed
completed.add(entry.getKey());
} catch (Exception e) { // TODO what exceptions are ok to catch and
retry on? this
// exception is too broad
if (tserverIsDead(entry.getKey())) {
// if the tserver is no longer alive then when tablets load
elsewhere they will see the files, so mark it completed
completed.add(entry.getKey());
} else {
// TODO log something about waiting on this tserver for bulk
import refresh request
}
}
}
// remove all tablet server that were successfully refreshed
refreshesNeeded.keySet().removeAll(completed);
if (!refreshesNeeded.isEmpty()) {
// TODO sleep and log using a retry object
}
}
return new CleanUpBulkImport(bulkInfo);
}
}
```
Learned a few things from this. Will probably need a thread pool and will
need to handle tservers that die after getting the information from the
metadata table. I also realized there could be some tricky issues with future
locations with this approach, but those should be ok also long as the tserver
handles concurrent tablet loads and refresh request correctly. I can't think
of any cases with this approach where a tablet that needs to refresh does not.
Thinking of switching to this approach. It has pluses and minuses compared
to the refresh column. Thinking its simpler to start with, likely faster, and
if any of the problems the refresh column addresses show up then we can
reconsider it later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]