================
@@ -103,27 +112,226 @@ static cl::opt<bool>
     ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
                  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt<float> ICPVTablePercentageThreshold(
+    "icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+    cl::desc("The percentage threshold of vtable-count / function-count for "
+             "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
----------------
minglotus-6 wrote:

> I think what you mean is that doing so for an earlier candidate delays the 
> comparisons for later candidates, but that for the last candidate, only the 
> fallback path is affected?

Yes. I updated the comment.

> Do we expect to set this parameter above 1?

Yes. Setting it to 1 is to make the default parameter conservative.  Based on 
my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable 
performance win compared with 1, and setting it to 3 doesn't give stable 
performance wins across different binaries or across runs.
 
One interesting thing is the actual cost of materializing one vtable address 
point depends on compile option `fpic/fpie`, and the cost of materializing a 
vtable address point and a function is comparable if `fpie/fpic` option is the 
same.
  * For non-pie binaries, `@vtable + address-point-offset` is lowered to an 
immediate representing vtable address point. It could be folded into `icmp` IR 
after lowering, something like `icmp #imm, <reg>`. For pie (but non-pic) 
binaries, `@vtable + address-point-offset` is lowered to a pc-relative address. 
So it takes one instruction to materialize the pc-relative address 
itself(something like `leaq      2890849(%rip), %rdx     # 0x30fe50 
<_ZTV8Derived1>` for x86). 
 



https://github.com/llvm/llvm-project/pull/81442
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to