On Thu, Sep 22, 2022 at 10:01 PM Denis Nikitin <[email protected]> wrote:

> Hi Mark,
>
> On Thu, Sep 22, 2022 at 3:38 AM Marc Zyngier <[email protected]> wrote:
> >
> > I was really hoping that you'd just drop the flags from the CFLAGS
> > instead of removing the generated section. Something like:
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index b5c5119c7396..e5b2d43925b4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -88,7 +88,7 @@ quiet_cmd_hypcopy = HYPCOPY $@
> >
> >  # Remove ftrace, Shadow Call Stack, and CFI CFLAGS.
> >  # This is equivalent to the 'notrace', '__noscs', and '__nocfi'
> annotations.
> > -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS)
> $(CC_FLAGS_CFI), $(KBUILD_CFLAGS))
> > +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS)
> $(CC_FLAGS_CFI) -fprofile-sample-use, $(KBUILD_CFLAGS))
> >
> >  # KVM nVHE code is run at a different exception code with a different
> map, so
> >  # compiler instrumentation that inserts callbacks or checks into the
> code may
>
> Sorry, I moved on with a different approach and didn't explain the
> rationale.
>
> Like you mentioned before, the flag `-fprofile-sample-use` does not appear
> in the kernel. And it looks confusing when the flag is disabled or
> filtered out
> here. This was the first reason.
>
> The root cause of the build failure wasn't the compiler profile guided
> optimization but the extra metadata in SHT_REL section which llvm injected
> into kvm_nvhe.tmp.o for further link optimization.
> If we remove the .llvm.call-graph-profile section we fix the build and
> avoid
> potential problems with relocations optimized by the linker. The profile
> guided optimization will still be applied by the compiler.
>
> Let me know what you think about it.
>
> >
> > However, I even failed to reproduce your problem using LLVM 14 as
> > packaged by Debian (if that matters, I'm using an arm64 build
> > machine). I build the kernel with:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-sample-use -j8 vmlinux
> >
> > and the offending object only contains the following sections:
> >
>

Just some comments based on my ChromeOS build experience.

fprofile-sample-use needs the profile file name argument to read the pgo
data from
i.e. -fprofile-sample-use=/path/to/gcov.profile.

Since the path to filename can change, it makes filtering out more
difficult.
It is certainly possible to find and filter the exact argument by some
string search of KCFLAGS.
But passing  -fno-profile-sample-use is easier and less error prone which I
believe the previous patch version tried to do.


> arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o:     file format
> elf64-littleaarch64
> >
> > Sections:
> > Idx Name          Size      VMA               LMA               File
> off  Algn
> >   0 .hyp.idmap.text 00000ae4  0000000000000000  0000000000000000
> 00000800  2**11
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> >   1 .hyp.text     0000e988  0000000000000000  0000000000000000
> 00001800  2**11
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> >   2 .hyp.data..ro_after_init 00000820  0000000000000000
> 0000000000000000  00010188  2**3
> >                   CONTENTS, ALLOC, LOAD, DATA
> >   3 .hyp.rodata   00002e70  0000000000000000  0000000000000000
> 000109a8  2**3
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
> >   4 .hyp.data..percpu 00001ee0  0000000000000000  0000000000000000
> 00013820  2**4
> >                   CONTENTS, ALLOC, LOAD, DATA
> >   5 .hyp.bss      00001158  0000000000000000  0000000000000000
> 00015700  2**3
> >                   ALLOC
> >   6 .comment      0000001f  0000000000000000  0000000000000000
> 00017830  2**0
> >                   CONTENTS, READONLY
> >   7 .llvm_addrsig 000000b8  0000000000000000  0000000000000000
> 0001784f  2**0
> >                   CONTENTS, READONLY, EXCLUDE
> >   8 .altinstructions 00001284  0000000000000000  0000000000000000
> 00015700  2**0
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
> >   9 __jump_table  00000960  0000000000000000  0000000000000000
> 00016988  2**3
> >                   CONTENTS, ALLOC, LOAD, RELOC, DATA
> >  10 __bug_table   0000051c  0000000000000000  0000000000000000
> 000172e8  2**2
> >                   CONTENTS, ALLOC, LOAD, RELOC, DATA
> >  11 __kvm_ex_table 00000028  0000000000000000  0000000000000000
> 00017808  2**3
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
> >  12 .note.GNU-stack 00000000  0000000000000000  0000000000000000
> 00027370  2**0
> >                   CONTENTS, READONLY
> >
> > So what am I missing to trigger this issue? Does it rely on something
> > like PGO, which is not upstream yet? A bit of handholding would be
> > much appreciated.
>
> Right, it relies on the PGO profile.
> On ChromeOS we collect the sample PGO profile from Arm devices with
> enabled CoreSight/ETM. You can find more details on ETM at
> https://www.kernel.org/doc/Documentation/trace/coresight/coresight.rst.
>
>
> https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md
> contains information about the pipeline of collecting, processing, and
> applying
> the profile.
>
>
Generally the difficult part is in collecting a good matching profile for
the workload.
So I think this patch is better than previous since it still keeps the
compiler optimization for the hot code paths
in the file but removes the problematic section.

Thanks,
Manoj



> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
> Thanks,
> Denis
>
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to