dlmarion commented on pull request #2096: URL: https://github.com/apache/accumulo/pull/2096#issuecomment-842469896
In the current website PR I removed our original design document because @keith-turner and I had made several changes over the course of development. However, I still have the original documentation in another branch. I revisited our [original](https://github.com/dlmarion/accumulo-website/blob/external-compaction-design-capture-all-information/design/external-compaction.md) design and I think what we have today is closely aligned with what we envisioned at the beginning of this process. Having the benefit of already writing the external compaction code, I'm not sure that a separate and independent service could be achieved without a significant set of new APIs being written for Accumulo. I think @keith-turner pointed to the planner being the place where someone can write their own compaction implementation. But, I do agree that this initial implementation is tightly coupled to Accumulo internals. Maybe this is a stepping stone to a fully pluggable implementation. Looking at the coordinator should tell us what it needs in a public API to make it fully external. Finally, my personal goal here was to move compactions out of the TServer for several reasons (below). The Compactor component can be run on different hardware than the TabletServers, and even run in Kubernetes using their dynamic pod scheduling feature to scale up/down the number of Compactors based on load. * Allow compactions to outlive a TabletServer * Allow compactions to occur concurrent to a Tablet being re-hosted * Reduce the load on the TabletServer, giving it more cycles to insert mutations and respond to scans * Allow compactions to be scaled differently than the number of TabletServers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
