keith-turner commented on code in PR #3424:
URL: https://github.com/apache/accumulo/pull/3424#discussion_r1207218277
##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -278,33 +278,34 @@ public static Map<KeyExtent,Long>
estimateSizes(AccumuloConfiguration acuConf,
Text row = new Text();
- FileSKVIterator index =
- FileOperations.getInstance().newIndexReaderBuilder().forFile(dataFile,
ns, ns.getConf(), cs)
-
.withTableConfiguration(acuConf).withFileLenCache(fileLenCache).build();
+ List<KeyExtent> sortedKeyExtents = new ArrayList<>(counts.keySet());
- try {
+ try (FileSKVIterator index =
+ FileOperations.getInstance().newIndexReaderBuilder().forFile(dataFile,
ns, ns.getConf(), cs)
+
.withTableConfiguration(acuConf).withFileLenCache(fileLenCache).build()) {
while (index.hasTop()) {
Key key = index.getTopKey();
totalIndexEntries++;
key.getRow(row);
- // TODO this could use a binary search
- for (Entry<KeyExtent,MLong> entry : counts.entrySet()) {
- if (entry.getKey().contains(row)) {
- entry.getValue().l++;
+ int start = 0, end = sortedKeyExtents.size() - 1;
Review Comment:
> The biggest problem here is you are defeating the entire purpose by
creating a new ArrayList.
I don't think creating the ArrayList was necessarily a perf problem because
it was done once outside of a loop and the binary search was done many times
inside a loop.
@DomGarguilo I unresolved this in case you wanted to look into using tailMap
in the TreeMap. Can always resolve this comment if not interested in looking
into that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]