[clang] [Clang][CIR][Doc] Document CIR code duplication plans (PR #166457)

Andy Kaylor via cfe-commits Wed, 19 Nov 2025 13:21:09 -0800

================
@@ -0,0 +1,236 @@
+================================
+ClangIR Code Duplication Roadmap
+================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+This document describes the general approach to code duplication in the ClangIR
+code generation implementation. It acknowledges specific problems with the
+current implementation, discusses strategies for mitigating the risk inherent 
in
+the current approach, and describes a general long-term plan for addressing the
+issue.
+
+Background
+==========
+
+The ClangIR code generation is very closely modeled after Clang's LLVM IR code
+generation, and we intend for the CIR produced to eventually be semantically
+equivalent to the LLVM IR produced when not going through ClangIR. However, we
+acknowledge that as the ClangIR implementation is under development, there will
+be differences in semantics, both because we have not yet implemented all
+features of the classic codegen and because the CIR dialect is still evolving
+and does not yet have a way to represent all of the necessary semantics.
+
+We have chosen to model the ClangIR code generation directly after the classic
+codegen, to the point of following identical code structure, using similar 
names
+and often duplicating the logic because this seemed to be the most certain path
+to producing equivalent results. Having such nearly identical code allows for
+direct comparison between the CIR codegen and the LLVM IR codegen to find what
+is missing or incorrect in the CIR implementation.
+
+However, we recognize that this is not a sustainable permanent solution. As
+bugs are fixed and new features are added to the classic codegen, the process 
of
+keeping the analogous CIR code up to date will be a purely manual process.
+
+Long term, we need a more sustainable approach.
+
+Current Strategy
+================
+
+Practical considerations require that we make steady progress towards a working
+implementation of ClangIR. This necessity is directly opposed to the goal of
+minimizing code duplication.
+
+For this reason, we have decided to accept a large amount of code duplication
+in the short term, even with the explicit understanding that this is producing
+a significant amount of technical debt as the project progresses.
+
+As the CIR implementation is developed, we often note small pieces of code that
+could be shared with the classic codegen if they were moved to a different part
+of the source, such as a shared utility class in some directory available to
+both codegen implementations or by moving the function into a related AST 
class.
+It is left to the discretion of the developer and reviewers to decide whether
+such refactoring should be done during the CIR development, or if it is
+sufficient to leave a comment in the code indicating this as an opportunity for
+future improvement. Because much of the current code is likely to change when
+the long term code sharing strategy is complete, we will lean towards only
+implementing refactorings that make sense independent of the code sharing
+problem.
+
+We have discussed various ways that major classes such as CGCXXABI/CIRGenCXXABI
+could be refactored to allow parts of there implementation to be shared today
+through inheritence and templated base classes. However, this may prove to be
+wasted effort when the permanent solution is developed, so we have decided that
+it is better to accept significant amounts of code duplication now, and defer
+this type of refactoring until it is clear what the permanent solution will be.
+
+Mitigation Through Testing
+==========================
+
+The most important tactic that we are using to mitigate the risk of CIR 
diverging
+from classic codegen is to incorporate two sets of LLVM IR checks in the CIR
+codegen LIT tests. One set checks the LLVM IR that is produced by first
+generating CIR and then lowering that to LLVM IR. Another set checks the LLVM 
IR
+that is produced directly by the classic codegen.
+
+At the time that tests are created, we compare the LLVM IR output from these 
two
+paths to verify (manually) that any meaningful differences between them are the
+result of known missing features in the current CIR implementation. Whenever
+possible, differences are corrected in the same PR that the test is being 
added,
+updating the CIR implementation as it is being developed.
+
+However, these tests serve a second purpose. They also serve as sentinels to
+alert us to changes in the classic codegen behavior that will need to be
+accounted for in the CIR implementation. While we appreciate any help from
+developers contributing to classic codegen, our current expectation is that it
+will be the responsibility of the ClangIR contributors to update the CIR
+implementation when these tests fail.
+
+As the CIR implementation gets closer to the goal of IR that is semantically
+equivalent to the LLVM IR produced by the classic codegen, we would like to
+enhance the CIR tests to perform some automatic verification of the equivalence
+of the generated LLVM IR, perhaps using a tool such as Alive2.
+
+Eventually, we would like to be able to run all existing classic codegen tests
+using the CIR path as well.
+
+Other Considerations
+====================
+
+The close modeling of CIR after classic codegen has also meant that the CIR
+dialect often represents language details at a much lower level than it ideally
+should.
+
+In the interest of having a complete working implementation of ClangIR as soon
+as is practical, we have chosen to take the approach of following the classic
+codegen implementation closely in the initial implementation and only raising
+the representation in the CIR dialect to a higher level when there is a clear
+and immediate benefit to doing so.
+
+Over time, we expect to progressively raise the CIR representation to a higher
+level and remove low level details, including ABI-specific handling from the
+dialect. However, having a working implementation in place makes it easier to
+verify that the high level representation and subsequent lowering are correct.
----------------
andykaylor wrote:


My intent in this section is to highlight how the current approach supports the 
progressive raising of CIR to higher-level representations. I think the details 
of where we want to go are better left to the long-term vision section below, 
but maybe I can include a reference to that section here.

https://github.com/llvm/llvm-project/pull/166457
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][CIR][Doc] Document CIR code duplication plans (PR #166457)

Reply via email to