[GitHub] [accumulo] ctubbsii commented on a change in pull request #2181: Update LogReader to utilize RecoveryLogsIterator

GitBox Wed, 07 Jul 2021 12:36:25 -0700


ctubbsii commented on a change in pull request #2181:
URL: https://github.com/apache/accumulo/pull/2181#discussion_r665638698




##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -56,15 +56,20 @@
   private final List<Scanner> scanners;
   private final Iterator<Entry<Key,Value>> iter;
 
+  public RecoveryLogsIterator(ServerContext context, List<Path> 
recoveryLogDirs,
+      boolean checkFirstKEy) throws IOException {
+    this(context, recoveryLogDirs, null, null, checkFirstKEy);
+  }
+
   /**
    * Scans the files in each recoveryLogDir over the range [start,end].
    */
-  RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, 
LogFileKey start,
+  public RecoveryLogsIterator(ServerContext context, List<Path> 
recoveryLogDirs, LogFileKey start,
       LogFileKey end, boolean checkFirstKey) throws IOException {
 
     List<Iterator<Entry<Key,Value>>> iterators = new 
ArrayList<>(recoveryLogDirs.size());
     scanners = new ArrayList<>();
-    Range range = LogFileKey.toRange(start, end);
+    Range range = start != null ? LogFileKey.toRange(start, end) : null;

Review comment:
       This is more readable without the negations:
   
   ```suggestion
       Range range = start == null ? null : LogFileKey.toRange(start, end);
   ```

##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogReader.java
##########
@@ -35,11 +36,13 @@
 import org.apache.accumulo.core.data.Mutation;
 import org.apache.accumulo.core.data.TableId;
 import org.apache.accumulo.core.dataImpl.KeyExtent;
+import org.apache.accumulo.server.ServerContext;
+import org.apache.accumulo.server.fs.VolumeManager;
 import org.apache.accumulo.server.fs.VolumeManagerImpl;
 import org.apache.accumulo.start.spi.KeywordExecutable;
 import org.apache.accumulo.tserver.log.DfsLogger;
 import org.apache.accumulo.tserver.log.DfsLogger.LogHeaderIncompleteException;
-import org.apache.accumulo.tserver.log.RecoveryLogReader;

Review comment:
       Is the old RecoveryLogReader not used after this? Can it be deleted?

##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -56,15 +56,20 @@
   private final List<Scanner> scanners;
   private final Iterator<Entry<Key,Value>> iter;
 
+  public RecoveryLogsIterator(ServerContext context, List<Path> 
recoveryLogDirs,
+      boolean checkFirstKEy) throws IOException {
+    this(context, recoveryLogDirs, null, null, checkFirstKEy);
+  }

Review comment:
       It seems like this new constructor was added for convenience, but it 
doesn't have a javadoc to explain how it differs from the other constructor. If 
your goal is to make it crystal clear when you would use these additional 
parameters and when you don't need them, instead of using constructors here you 
could use static methods whose names are descriptive in the API.
   
   So, instead of the following in the calling code:
   
   ```java
   var iter = new RecoveryLogsIterator(context, dirs, start, stop, false);
   var iter2 = new RecoveryLogsIterator(context, dirs, true);
   ```
   
   it could look like:
   
   ```java
   var iter = RecoveryLogsIterator.scanRange(context, dirs, start, stop, false);
   var iter2 = RecoveryLogsIterator.scanAll(context, dirs); // don't think you 
need the boolean for this case
   ```
   
   Then, you can make the constructor private (or package-private, if you need 
it visible for testing), and the implementing static methods can do some 
additional parameter validation to ensure that when you specify a range, both 
arguments aren't null.
   
   These are just general design tips, though. For this, since the new 
constructor is only ever used once, I'd probably just remove it and pass in 
nulls on the old constructor for the one time it is called. Minimally, if 
you're keeping the new constructor, it should have a javadoc (and you should 
double check to see if you actually need to pass the boolean, since I don't 
think it matters when the first key is null).

##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogReader.java
##########
@@ -123,12 +127,16 @@ public void execute(String[] args) throws Exception {
       Set<Integer> tabletIds = new HashSet<>();
 
       for (String file : opts.files) {
-
         Path path = new Path(file);
         LogFileKey key = new LogFileKey();
         LogFileValue value = new LogFileValue();
 
         if (fs.getFileStatus(path).isFile()) {
+          if (file.endsWith(".rf")) {
+            log.error(
+                "Can not read from a single rfile. Please pass in a directory 
for recovery logs.");
+            continue;
+          }
           // read log entries from a simple hdfs file
           try (final FSDataInputStream fsinput = fs.open(path);
               DataInputStream input = DfsLogger.getDecryptingStream(fsinput, 
siteConfig)) {

Review comment:
       Given this error message and new behavior, it's not clear what this 
condition (single file, but not ending in ".rf") is supposed to do. Is this for 
reading the older sequence files? A comment could go a long way here. The only 
comment it has is "read log entries from a simple hdfs file". Clearly, that 
can't be read from a simple RFile in HDFS, so that comment could be updated to 
make it clear what files it *can* read (i.e. what does "simple hdfs file" 
mean?).

##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogReader.java
##########
@@ -123,12 +127,16 @@ public void execute(String[] args) throws Exception {
       Set<Integer> tabletIds = new HashSet<>();
 
       for (String file : opts.files) {
-
         Path path = new Path(file);
         LogFileKey key = new LogFileKey();
         LogFileValue value = new LogFileValue();
 
         if (fs.getFileStatus(path).isFile()) {
+          if (file.endsWith(".rf")) {

Review comment:
       There's probably a constant for the RFile extension somewhere that could 
be used... not that it would change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] ctubbsii commented on a change in pull request #2181: Update LogReader to utilize RecoveryLogsIterator

Reply via email to