ctubbsii commented on issue #1664: URL: https://github.com/apache/accumulo/issues/1664#issuecomment-666676241
I'm curious if it matters whether the candidates still have references in the metadata table or not, for the overall throughput. For example, if a batch size of 8MB results in most candidates being still in use, and only a few are actually available for deletion, does the performance drop on the DFS call to actually delete the file? Would it be better to have a larger batch size in order to make that later phase more efficient after the in-use candidates have been removed? Perhaps you can provide some timing information for a few scenarios. If none of the scenarios seem to be substantially different in the throughput, it's probably not worth making a new property to make this configurable. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
