================ @@ -0,0 +1,236 @@ +================================ +ClangIR Code Duplication Roadmap +================================ + +.. contents:: + :local: + +Introduction +============ + +This document describes the general approach to code duplication in the ClangIR +code generation implementation. It acknowledges specific problems with the +current implementation, discusses strategies for mitigating the risk inherent in +the current approach, and describes a general long-term plan for addressing the +issue. + +Background +========== + +The ClangIR code generation is very closely modeled after Clang's LLVM IR code +generation, and we intend for the CIR produced to eventually be semantically +equivalent to the LLVM IR produced when not going through ClangIR. However, we +acknowledge that as the ClangIR implementation is under development, there will +be differences in semantics, both because we have not yet implemented all +features of the classic codegen and because the CIR dialect is still evolving +and does not yet have a way to represent all of the necessary semantics. + +We have chosen to model the ClangIR code generation directly after the classic +codegen, to the point of following identical code structure, using similar names +and often duplicating the logic because this seemed to be the most certain path +to producing equivalent results. Having such nearly identical code allows for +direct comparison between the CIR codegen and the LLVM IR codegen to find what +is missing or incorrect in the CIR implementation. + +However, we recognize that this is not a sustainable permanent solution. As +bugs are fixed and new features are added to the classic codegen, the process of +keeping the analogous CIR code up to date will be a purely manual process. + +Long term, we need a more sustainable approach. + +Current Strategy +================ + +Practical considerations require that we make steady progress towards a working +implementation of ClangIR. This necessity is directly opposed to the goal of +minimizing code duplication. + +For this reason, we have decided to accept a large amount of code duplication +in the short term, even with the explicit understanding that this is producing +a significant amount of technical debt as the project progresses. + +As the CIR implementation is developed, we often note small pieces of code that +could be shared with the classic codegen if they were moved to a different part +of the source, such as a shared utility class in some directory available to +both codegen implementations or by moving the function into a related AST class. +It is left to the discretion of the developer and reviewers to decide whether +such refactoring should be done during the CIR development, or if it is +sufficient to leave a comment in the code indicating this as an opportunity for +future improvement. Because much of the current code is likely to change when +the long term code sharing strategy is complete, we will lean towards only +implementing refactorings that make sense independent of the code sharing +problem. + +We have discussed various ways that major classes such as CGCXXABI/CIRGenCXXABI +could be refactored to allow parts of there implementation to be shared today +through inheritence and templated base classes. However, this may prove to be +wasted effort when the permanent solution is developed, so we have decided that +it is better to accept significant amounts of code duplication now, and defer +this type of refactoring until it is clear what the permanent solution will be. + +Mitigation Through Testing +========================== + +The most important tactic that we are using to mitigate the risk of CIR diverging +from classic codegen is to incorporate two sets of LLVM IR checks in the CIR +codegen LIT tests. One set checks the LLVM IR that is produced by first +generating CIR and then lowering that to LLVM IR. Another set checks the LLVM IR +that is produced directly by the classic codegen. + +At the time that tests are created, we compare the LLVM IR output from these two +paths to verify (manually) that any meaningful differences between them are the +result of known missing features in the current CIR implementation. Whenever +possible, differences are corrected in the same PR that the test is being added, +updating the CIR implementation as it is being developed. + +However, these tests serve a second purpose. They also serve as sentinels to +alert us to changes in the classic codegen behavior that will need to be +accounted for in the CIR implementation. While we appreciate any help from +developers contributing to classic codegen, our current expectation is that it +will be the responsibility of the ClangIR contributors to update the CIR +implementation when these tests fail. + +As the CIR implementation gets closer to the goal of IR that is semantically +equivalent to the LLVM IR produced by the classic codegen, we would like to +enhance the CIR tests to perform some automatic verification of the equivalence +of the generated LLVM IR, perhaps using a tool such as Alive2. + +Eventually, we would like to be able to run all existing classic codegen tests +using the CIR path as well. + +Other Considerations +==================== + +The close modeling of CIR after classic codegen has also meant that the CIR +dialect often represents language details at a much lower level than it ideally +should. + +In the interest of having a complete working implementation of ClangIR as soon +as is practical, we have chosen to take the approach of following the classic +codegen implementation closely in the initial implementation and only raising +the representation in the CIR dialect to a higher level when there is a clear +and immediate benefit to doing so. + +Over time, we expect to progressively raise the CIR representation to a higher +level and remove low level details, including ABI-specific handling from the +dialect. However, having a working implementation in place makes it easier to +verify that the high level representation and subsequent lowering are correct. ---------------- andykaylor wrote:
My intent in this section is to highlight how the current approach supports the progressive raising of CIR to higher-level representations. I think the details of where we want to go are better left to the long-term vision section below, but maybe I can include a reference to that section here. https://github.com/llvm/llvm-project/pull/166457 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
