dlmarion commented on pull request #2096:
URL: https://github.com/apache/accumulo/pull/2096#issuecomment-842469896


   In the current website PR I removed our original design document because 
@keith-turner and I had made several changes over the course of development. 
However, I still have the original documentation in another branch. I revisited 
our 
[original](https://github.com/dlmarion/accumulo-website/blob/external-compaction-design-capture-all-information/design/external-compaction.md)
 design and I think what we have today is closely aligned with what we 
envisioned at the beginning of this process. Having the benefit of already 
writing the external compaction code, I'm not sure that a separate and 
independent service  could be achieved without a significant set of new APIs 
being written for Accumulo.
   
   I think @keith-turner pointed to the planner being the place where someone 
can write their own compaction implementation. But, I do agree that this 
initial implementation is tightly coupled to Accumulo internals. Maybe this is 
a stepping stone to a fully pluggable implementation. Looking at the 
coordinator should tell us what it needs in a public API to make it fully 
external.
   
   Finally, my personal goal here was to move compactions out of the TServer 
for several reasons (below). The Compactor component can be run on different 
hardware than the TabletServers, and even run in Kubernetes using their dynamic 
pod scheduling feature to scale up/down the number of Compactors based on load.
    
     * Allow compactions to outlive a TabletServer
     * Allow compactions to occur concurrent to a Tablet being re-hosted
     * Reduce the load on the TabletServer, giving it more cycles to insert 
mutations and respond to scans
     * Allow compactions to be scaled differently than the number of 
TabletServers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to