keith-turner opened a new issue, #5272: URL: https://github.com/apache/accumulo/issues/5272
**Is your feature request related to a problem? Please describe.** In #4898 a new mechanism was added to RFile to compute bulk import load plans as the RFile is written. This new mechanism was implemented using completely new code that examines each key value written. There may be existing code in RFile that could be leveraged for this computation that may reduce the amount of work done per key value written. **Describe the solution you'd like** Determine if this [code](https://github.com/apache/accumulo/blob/139d850e850277cfc0fd5e0da15abe1467b8fa5c/core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java#L473-L533) could be modified to help compute the load plan leveraging its tracking of first and last keys. Can that code be modified to minimize the total amount of per key/value work that the rfile write pipeline is doing? **Describe alternatives you've considered** It may be best to not make any changes at for this issue, its needs investigation. The following are some reasons that maybe no changes should be made for this issue. 1. The performance impact of the code that does per key examination added in #4898 is negligible compared to other parts of the rfile code write pipeline. Optimizing something that is not taking much time will not really speed up the overall write pipeline. Need to optimize the slowest parts to see measurable improvement. 2. The existing code is not well suited for the new task. 3. There too many existing layers of abstraction that would need to be broken to make the change. Only want to make this change if it give a measurable performance improvement and does not add tech debt to the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
