keith-turner commented on code in PR #3424:
URL: https://github.com/apache/accumulo/pull/3424#discussion_r1207211859


##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -278,33 +278,34 @@ public static Map<KeyExtent,Long> 
estimateSizes(AccumuloConfiguration acuConf,
 
     Text row = new Text();
 
-    FileSKVIterator index =
-        FileOperations.getInstance().newIndexReaderBuilder().forFile(dataFile, 
ns, ns.getConf(), cs)
-            
.withTableConfiguration(acuConf).withFileLenCache(fileLenCache).build();
+    List<KeyExtent> sortedKeyExtents = new ArrayList<>(counts.keySet());
 
-    try {
+    try (FileSKVIterator index =
+        FileOperations.getInstance().newIndexReaderBuilder().forFile(dataFile, 
ns, ns.getConf(), cs)
+            
.withTableConfiguration(acuConf).withFileLenCache(fileLenCache).build()) {
       while (index.hasTop()) {
         Key key = index.getTopKey();
         totalIndexEntries++;
         key.getRow(row);
 
-        // TODO this could use a binary search
-        for (Entry<KeyExtent,MLong> entry : counts.entrySet()) {
-          if (entry.getKey().contains(row)) {
-            entry.getValue().l++;
+        int start = 0, end = sortedKeyExtents.size() - 1;

Review Comment:
   The comment may have been referring to leveraging the `counts` TreeMap to do 
a binary search.   The following example illustrates some concepts that may be 
used to do this.  I ran it locally and it prints what I would expect.
   
   ```
       static KeyExtent nke(String er, String per) {
           return new KeyExtent(TableId.of("1"), er == null ? null : new 
Text(er), per == null ? null : new Text(per));
       }
   
       public static void main(String[] args) {
           TreeMap<KeyExtent, String> tm = new TreeMap<>();
   
           tm.put(nke("c",null), "1");
           tm.put(nke("f","c"), "2");
           tm.put(nke("m","f"), "3");
           tm.put(nke("ma","m"), "4");
           tm.put(nke(null,"ma"), "5");
   
           System.out.println("1 "+tm.ceilingEntry(nke("b",null)).getValue());
           System.out.println("1 "+tm.ceilingEntry(nke("c",null)).getValue());
           System.out.println("2 "+tm.ceilingEntry(nke("d",null)).getValue());
           System.out.println("2 "+tm.ceilingEntry(nke("f",null)).getValue());
           System.out.println("3 "+tm.ceilingEntry(nke("g",null)).getValue());
           System.out.println("3 "+tm.ceilingEntry(nke("m",null)).getValue());
           System.out.println("4 "+tm.ceilingEntry(nke("m ",null)).getValue());
           System.out.println("4 "+tm.ceilingEntry(nke("ma",null)).getValue());
           System.out.println("5 "+tm.ceilingEntry(nke("mb",null)).getValue());
           System.out.println("5 "+tm.ceilingEntry(nke(null,null)).getValue());
       }
   ```
   
   Always setting prevEndRow to null for the lookup exent in the example above 
is something I think is done elsewhere in the Accumulo code, but not sure.  I 
looked at the comparator for KeyExtent and I think its correct, but it would be 
good to double check that.
   
   In the example above the extents go from -inf to +inf with no gaps and no 
overlap of the extent. So that example does not capture the complexity of this 
code. The existing loop in the code handles gaps and overlaps w/o issue because 
it checks each extent for overlap.  I don't know the answer to this, but could 
we use the tailMap (and maybe headMap) functions of TreeMap  to efficiently 
find the extents that overlap?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to