ctubbsii commented on issue #1669:
URL: https://github.com/apache/accumulo/issues/1669#issuecomment-752705070


   > Is the underlying problem the performance of merging empty tablets on an 
active table? If so, I wonder if it would be possible to add an option to Merge 
a range of empty tablets and not have to lock the table.
   
   I don't think you can avoid locking the table in some way, but we may be 
able to add some sort of range lock.
   
   I was discussing the performance bottlenecks of merging with @EdColeman 
yesterday, and I pointed out that the biggest problem is chop-compactions, 
which truncate any non-empty tablets involved in the merge before completing 
the merge. This can be avoided in a special case if all sequential empty 
tablets being merged are merged into a single empty tablet, rather than merged 
with the adjacent non-empty one. This would avoid lots of HDFS operations, and 
file IO in that special case. In the general case, this can be avoided by 
storing range constraints per-file to match the original tablet in which the 
file was specified, as described in 1327.
   
   Eliminating chop compactions would effectively made merges a metadata-only 
operation, with no file IO, which would eliminate a lot of the performance 
issues people have had with merging.
   
   
   As for this issue, I still think automatic merging strategies are best kept 
in user utility code outside of Accumulo's code base, even if it's just getting 
rid of empty tablets. It's hard to infer user intentions to do anything 
automatic, and it adds too much complexity to support the user specifying their 
intentions in some sort of pluggable mechanism, with no substantial value over 
a fully client-side utility.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to