================ @@ -103,27 +112,226 @@ static cl::opt<bool> ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden, cl::desc("Dump IR after transformation happens")); +// Indirect call promotion pass will fall back to function-based comparison if +// vtable-count / function-count is smaller than this threshold. +static cl::opt<float> ICPVTablePercentageThreshold( + "icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden, + cl::desc("The percentage threshold of vtable-count / function-count for " + "cost-benefit analysis. ")); + +// Although comparing vtables can save a vtable load, we may need to compare +// vtable pointer with multiple vtable address points due to class inheritance. +// Comparing with multiple vtables inserts additional instructions on hot code +// path; and doing so for earlier candidate of one icall can affect later +// function candidate in an undesired way. We allow multiple vtable comparison ---------------- minglotus-6 wrote:
> I think what you mean is that doing so for an earlier candidate delays the > comparisons for later candidates, but that for the last candidate, only the > fallback path is affected? Yes. I updated the comment. > Do we expect to set this parameter above 1? Yes. Setting it to 1 is to make the default parameter conservative. Based on my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable performance win compared with 1, and setting it to 3 doesn't give stable performance wins across different binaries or across runs. One interesting thing is the actual cost of materializing one vtable address point depends on compile option `fpic/fpie`, and the cost of materializing a vtable address point and a function is comparable if `fpie/fpic` option is the same. * For non-pie binaries, `@vtable + address-point-offset` is lowered to an immediate representing vtable address point. It could be folded into `icmp` IR after lowering, something like `icmp #imm, <reg>`. For pie (but non-pic) binaries, `@vtable + address-point-offset` is lowered to a pc-relative address. So it takes one instruction to materialize the pc-relative address itself(something like `leaq 2890849(%rip), %rdx # 0x30fe50 <_ZTV8Derived1>` for x86). https://github.com/llvm/llvm-project/pull/81442 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits