keith-turner commented on code in PR #3401:
URL: https://github.com/apache/accumulo/pull/3401#discussion_r1197958044


##########
core/src/main/java/org/apache/accumulo/core/metadata/AbstractTabletFile.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata;
+
+import java.util.Objects;
+
+import org.apache.hadoop.fs.Path;
+
+public abstract class AbstractTabletFile<T extends AbstractTabletFile<T>> 
implements Comparable<T> {

Review Comment:
   This is a half baked thought that I am posting just for discussion, not 
looking for any changes in this PR.
   
   Currently we can write possibly dubious code like the following because 
StoredTabletFile is a child type of TabletFile.
   
   ```java
   Set<TabletFile> aSet = ...
   aSet.add(new StoredTabletFile(..));
   aSet.containst(new StoredTabletFile(..));
   ```
   
   If we made StoredTabletFile a sibling type of TabletFile by making 
StoredTabletFile extend AbstractTabletFile, I am wondering if it would buy us 
anything.  It would make the code example above not compile (the add() call 
will not compile, the contains call would warn).  Wondering if that would help 
improve code correctness or just be a waste of time, not sure w/o experimenting 
with the change.
   
   I noticed UnassignedTabletFile was a sibling of TabletFile and that along 
with #2830 led me to wonder about the usefulness of making StoredTabletFile a 
sibling too.
   



##########
core/src/main/java/org/apache/accumulo/core/metadata/UnassignedTabletFile.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Objects;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+public class UnassignedTabletFile extends 
AbstractTabletFile<UnassignedTabletFile> {

Review Comment:
   You mentioned you were not sure about the name.  One thing I have found that 
helps me with naming sometimes is to write the documentation and see what words 
I use in the docs.  I went through that exercise and found I was using the word 
referenced, so that could be a candidate for the name.  This is also aligns 
with the terminology used in the accumulo GC source, it calls files in tablets 
references.
   
   ```suggestion
   /**
    *  A file that is not intended to be added to a tablet as a reference, 
within the scope of the code using this class, but needs to be passed to code 
that processes tablet files.  These files could be temp files or files directly 
created by a user for bulk import.  The file may ultimately be added to a 
tablet later as a new file reference, but within a different scope (process, 
thread, code block, method, etc) that uses a different class to represent the 
file in that scope.
    */
   public class UnreferencedTabletFile extends 
AbstractTabletFile<UnreferencedTabletFile> {
   ```



##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -94,12 +93,12 @@ public boolean equals(Object obj) {
 
     @Override
     public int hashCode() {
-      return fileName.hashCode();
+      return file.hashCode();
     }
   }
 
-  private Map<String,List<OpenReader>> openFiles;
-  private HashMap<FileSKVIterator,String> reservedReaders;
+  private Map<TabletFile,List<OpenReader>> openFiles;

Review Comment:
   Anytime we use TabletFile as key we have to be wary of #2830.  I think this 
is probably ok for now, but future changes to StoredTabletFile equals and 
hashcode could break this.



##########
core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java:
##########
@@ -155,4 +142,9 @@ public int hashCode() {
   public String toString() {
     return normalizedPath;
   }
+
+  public static TabletFile of(final Path path) {

Review Comment:
   What motivated this new method? Could we eventually make the constructor 
private?



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -548,22 +548,22 @@ public SortedMap<KeyExtent,Bulk.Files> 
computeFileToTabletMappings(FileSystem fs
         context.instanceOperations().getSystemConfiguration(), tableProps, 
tableId);
 
     for (FileStatus fileStatus : files) {
-      Path filePath = fileStatus.getPath();
+      UnassignedTabletFile file = UnassignedTabletFile.of(fs, 
fileStatus.getPath());
       CompletableFuture<Map<KeyExtent,Bulk.FileInfo>> future = 
CompletableFuture.supplyAsync(() -> {
         try {
           long t1 = System.currentTimeMillis();
           List<KeyExtent> extents =
-              findOverlappingTablets(context, extentCache, filePath, fs, 
fileLensCache, cs);
+              findOverlappingTablets(context, extentCache, file, fs, 
fileLensCache, cs);
           // make sure file isn't going to too many tablets
-          checkTabletCount(maxTablets, extents.size(), filePath.toString());
-          Map<KeyExtent,Long> estSizes = 
estimateSizes(context.getConfiguration(), filePath,
+          checkTabletCount(maxTablets, extents.size(), file.toString());
+          Map<KeyExtent,Long> estSizes = 
estimateSizes(context.getConfiguration(), file,
               fileStatus.getLen(), extents, fs, fileLensCache, cs);
           Map<KeyExtent,Bulk.FileInfo> pathLocations = new HashMap<>();
           for (KeyExtent ke : extents) {
-            pathLocations.put(ke, new Bulk.FileInfo(filePath, 
estSizes.getOrDefault(ke, 0L)));
+            pathLocations.put(ke, new Bulk.FileInfo(file.getPath(), 
estSizes.getOrDefault(ke, 0L)));

Review Comment:
   Looked into passing TabletFile to FileInfo.  FileInfo is used for 
serialization, so internally it needs a string.  Could possible pass the tablet 
file obj have it do the conversion in the constructor. 



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java:
##########
@@ -55,7 +55,7 @@ public class RFileOperations extends FileOperations {
 
   private static RFile.Reader getReader(FileOptions options) throws 
IOException {
     CachableBuilder cb = new CachableBuilder()
-        .fsPath(options.getFileSystem(), new Path(options.getFilename()), 
options.dropCacheBehind)
+        .fsPath(options.getFileSystem(), options.getFile().getPath(), 
options.dropCacheBehind)

Review Comment:
   Interesting how few changes there are to this class.



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -261,9 +262,9 @@ public MLong(long i) {
     long l;
   }
 
-  public static Map<KeyExtent,Long> estimateSizes(AccumuloConfiguration 
acuConf, Path dataFile,
-      long fileSize, Collection<KeyExtent> extents, FileSystem ns, 
Cache<String,Long> fileLenCache,
-      CryptoService cs) throws IOException {
+  public static Map<KeyExtent,Long> estimateSizes(AccumuloConfiguration 
acuConf,
+      UnassignedTabletFile dataFile, long fileSize, Collection<KeyExtent> 
extents, FileSystem ns,
+      Cache<String,Long> fileLenCache, CryptoService cs) throws IOException {

Review Comment:
   I was wondering if we could use a better type than String for 
`Cache<String,Long>`.  Looked into the code a bit an I am not sure if we could 
use TabletFile.  If we can not use TabletFile we may be able to created a 
specialized CachableBlockFile.CacheId type to use in lieu of String..  This 
would be a follow on issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to