keith-turner commented on a change in pull request #2368:
URL: https://github.com/apache/accumulo/pull/2368#discussion_r765924253
##########
File path:
core/src/main/java/org/apache/accumulo/core/file/rfile/GenerateSplits.java
##########
@@ -201,7 +201,9 @@ public void execute(String[] args) throws Exception {
itemsSketch.update(row);
iterator.next();
}
- return itemsSketch.getQuantiles(numSplits);
+ Text[] items = itemsSketch.getQuantiles(numSplits + 2);
Review comment:
> I think checking the different outputs of the Datasketches API is a
bit out of scope for this utility.
I was not proposing checking different outputs of the API at this point.
Was more thinking it would be good to fully understand the functionality behind
the API and what the edge cases may be. The `.getRetainedItems()` API seems to
indicate that that there may be an upper bound to how many splits the API would
track, but I am not sure about this. The class seems to buffer information in
memory, but I don't have a good sense of what if any limits there are on how
much it will buffer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]