ctubbsii edited a comment on issue #1669: URL: https://github.com/apache/accumulo/issues/1669#issuecomment-752705070
> Is the underlying problem the performance of merging empty tablets on an active table? If so, I wonder if it would be possible to add an option to Merge a range of empty tablets and not have to lock the table. I don't think you can avoid locking the table in some way, but we may be able to add some sort of range lock. I was discussing the performance bottlenecks of merging with @EdColeman yesterday, and I pointed out that the biggest problem is chop-compactions, which truncate any non-empty tablets involved in the merge before completing the merge. This can be avoided in a special case if all sequential empty tablets being merged are merged into a single empty tablet, rather than merged with the adjacent non-empty one. This would avoid lots of HDFS operations, and file IO in that special case. In the general case, this can be avoided by storing range constraints per-file to match the original tablet in which the file was specified, as described in #1327. Eliminating chop compactions would effectively made merges a metadata-only operation, with no file IO, which would eliminate a lot of the performance issues people have had with merging. As for this issue, I still think automatic merging strategies are best kept in user utility code outside of Accumulo's code base, even if it's just getting rid of empty tablets. It's hard to infer user intentions to do anything automatic, and it adds too much complexity to support the user specifying their intentions in some sort of pluggable mechanism, with no substantial value over a fully client-side utility. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
