ctubbsii commented on pull request #2096: URL: https://github.com/apache/accumulo/pull/2096#issuecomment-842543283
@keith-turner wrote: > These RPCs are structured specifically for the way the rest of the distributed system (coordinator and compactors) works. Should these tserver RPCs be generalized? If so, how? Sorry, I don't know enough about the current implementation to understand the question yet, in order to answer it. The generalized interface I was thinking about was something like `BiFunction<Set<File>, CompactionConfig, Future<File>>`. I was thinking any necessary RPCs would be implementation-specific, and independent of Accumulo's RPCs. My preconceptions of what this would look like have been broken, so I would need to do some catch up understanding before I could have specific thoughts on how they could be improved. > > My understanding of the original idea to create an "external" compactions, was to create them "external" to Accumulo... something Accumulo could "submit" work to do and not care about the implementation, and get back a new compacted file. > > That was never an explicit goal I had. Okay, I guess I just had a different idea in mind of what "external" meant. > Is there another path other than what I have described? Does this path actually lower the complexity? I am not sure it would, I think the answer to that depends on what these new API/SPI calls are. If we went with the `BiFunction<Set<File>, CompactionConfig, Future<File>>` concept, or similar, then nothing internal need be exposed to it. The RFile config settings (locality groups, etc.) and iterators are already "public API". A lot of the compaction complexity, with the services, planners, executors, queues, workers, etc. could sit inside an implementation. Or, all that could be left as-is, and just expect the pluggable part to be the "last stage" after going through that management pipeline, which I think is basically how it is now... but not expressed as simply as I have with the "BiFunction" expression. I'm curious, do the external compactors support RFile encryption and other RFile-specific configs (like block sizes, locality groups, etc.) today? My guess is that they do, because it's still all the same Accumulo code, just running in a different process, but just curious how much you've exercised those while testing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
