keith-turner commented on PR #3350:
URL: https://github.com/apache/accumulo/pull/3350#issuecomment-1530329288

   Experimented with changing just the RefreshTablets Repo for the FATE 
operation to see what it would look like if we dropped the refresh column and 
only sent RPCs.
   
   ```java
   public class RefreshTablets extends ManagerRepo {
      public long isReady(long tid, Manager manager) throws Exception {
          // nothing to do here, always ready to run
          return 0;
      }
     @Override
     public Repo<Manager> call(long tid, Manager manager) throws Exception {
   
       Map<Location,List<KeyExtent>> refreshesNeeded;
   
       try (var tablets =
           
manager.getContext().getAmple().readTablets().forTable(bulkInfo.tableId).checkConsistency()
               .fetch(ColumnType.LOADED, ColumnType.LOCATION, 
ColumnType.PREV_ROW).build()) {
   
         // Find all tablets that have a location and load markers for this 
bulk load operation and
         // therefore need to refresh their metadata. There may be some tablets 
in the map that were
         // hosted after the bulk files were set and that is ok, it just 
results in an unneeded refresh
         // request. There may also be tablets that had a location when the 
files were set but do not
         // have a location now, that is ok the next time that tablet loads 
somewhere it will see the
         // files.
         refreshesNeeded = tablets.stream()
             .filter(tabletMetadata -> tabletMetadata.getLocation() != null
                 && tabletMetadata.getLoaded().containsValue(tid))
             .collect(groupingBy(TabletMetadata::getLocation,
                 mapping(TabletMetadata::getExtent, toList())));
       }
       HashSet<Location> completed = new HashSet<>();
   
       while (!refreshesNeeded.isEmpty()) {
         for (Map.Entry<Location,List<KeyExtent>> entry : 
refreshesNeeded.entrySet()) {
           try {
             // TODO use a thread pool to send RPC request
             // Ask tablet server to reload the metadata for these tablets. If 
the tabletserver is no
             // longer serving a tablet that is ok as long is reloads the ones 
that it is
             // still serving. If a tablet was unloaded after the map was 
created above when it
             // reloads elsewhere it will see the bulk files. The tserver will 
need to ensure tablets
             // that had future location and are in the process of loading are 
handled correctly or
             // the refresh request could be missed due to race conditions in 
the tablet server.
             sendSyncRefreshRequest(entry.getKey(), entry.getValue());
             // refresh request returned successfully, so mark the location as 
completed
             completed.add(entry.getKey());
           } catch (Exception e) { // TODO what exceptions are ok to catch and 
retry on? this
                                   // exception is too broad
             if (tserverIsDead(entry.getKey())) {
               // if the tserver is no longer alive then when tablets load 
elsewhere they will see the files, so mark it completed
               completed.add(entry.getKey());
             } else {
               // TODO log something about waiting on this tserver for bulk 
import refresh request
             }
           }
         }
   
         // remove all tablet server that were successfully refreshed
         refreshesNeeded.keySet().removeAll(completed);
   
         if (!refreshesNeeded.isEmpty()) {
           // TODO sleep and log using a retry object
         }
       }
   
       return new CleanUpBulkImport(bulkInfo);
     }   
   }
   ```
   
   Learned a few things from this.  Will probably need a thread pool and will 
need to handle tservers that die after getting the information from the 
metadata table. I also realized there could be some tricky issues with future 
locations with this approach, but those should be ok also long as the tserver 
handles concurrent tablet loads and refresh request correctly.  I can't think 
of any cases with this approach where a tablet that needs to refresh does not. 
   
   Thinking of switching to this approach.  It has pluses and minuses compared 
to the refresh column.  Thinking its simpler to start with, likely faster, and 
if any of the problems the refresh column addresses show up then we can 
reconsider it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to