ctubbsii commented on issue #1669: URL: https://github.com/apache/accumulo/issues/1669#issuecomment-667412940
I worry that automatic merging could be used for people to shoot themselves in the foot, so to speak, with respect to performance. Any time Accumulo does something "automatically", it can be a surprise for users. For example, adding this could lead to situations where users unintentionally configure it to delete empty tablets, and then lose all benefits from pre-splitting tables at creation time to prep for efficient distributed bulk ingest. Baking this feature into Accumulo, would also add significant complexity into Accumulo's own code without much benefit to being there, vs. being a client side process. It seems to me that a client-side process could easily trigger merges as needed, without introducing this complexity internal to Accumulo, and it would perform just as well. In general, I'd prefer to reduce complexity in Accumulo, and modularize functionality, unless there is a clear benefit to being baked-in to justify the added complexity. In this case, I think a client-side process could perform this function just as well. What do you think? Do you think there's a clear benefit to having it baked in vs. having a client-side process to perform this function? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
