Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

Kostya Serebryany via gcc-patches Mon, 09 Sep 2019 18:06:44 -0700

+Peter Collingbourne +Evgeniy Stepanov (the main developers of HWASAN
in LLVM,  FYI)
Please note that Peter has recently implemented support for globals in
LLVM's HWASAN.


--kcc

On Mon, Sep 9, 2019 at 8:55 AM Matthew Malcomson
<[email protected]> wrote:
>
> On 09/09/19 11:47, Martin Liška wrote:
> > On 9/6/19 4:46 PM, Matthew Malcomson wrote:
> >> Hello,
> >>
> >> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
> >> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
> >> found
> >> here 
> >> http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> >
> > Hello.
> >
> > I'm happy that you are working on the functionality for GCC and I can 
> > provide
> > my knowledge that I have with ASAN. I briefly read the patch series and I 
> > have
> > multiple questions (and observations):
> >
> > 1) Is the ambition of the patchset to be a software emulation of MTE that 
> > can
> >     work targets that do not support MTE? Is it something what clang
> >     names hwasan-abi=interceptor?
>
> The ambition is to provide a software emulation of MTE for AArch64
> targets that don't support MTE.
> I also hope to have the framework set up so that enabling for other
> architectures is relatively easy and can be done by those interested.
>
> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
> adding such MTE emulation for "application code" or "platform code (e.g.
> kernel)" respectively.
>
> >
> > 2) Do you have a real aarch64 hardware that has MTE support? Would it be 
> > possible
> >     for the future to give such a machine to GCC Compile Farm for testing 
> > purpose?
>
> No our team doesn't have real MTE hardware, I have been testing on an
> AArch64 machine that has TBI, other work in the team that requires MTE
> support is being tested on the Arm "Fast Models" emulator.
>
> >
> > 3) I like the idea of sharing of internal functions like 
> > ASAN_CHECK/HWASAN_CHECK.
> >     We should benefit from that in the future.
> >
> > 4) Am I correct that due to escape of "tagged" pointers, one needs to have 
> > an entire
> > DSO (dynamic shared object) built with hwasan enabled? Otherwise, a 
> > dereference of
> > a tagged pointer will lead to a segfault (except TBI feature on aarch64)?
>
>
> Yes, one needs to take pains to avoid the escape of tagged pointers on
> architectures other than AArch64.
>
> I don't believe that compiling the entire DSO with HWASAN enabled is
> enough, since pointers can be passed across DSO boundaries.
> I haven't yet looked into how to handle this.
>
> There's an even more fundamental problem of accesses within the
> instrumented binary -- I haven't yet figured out how to remove the tag
> before accesses on architectures without the AArch64 TBI feature.
>
>
> >
> > 5) Is there a documentation/definition of how shadow memory for memory 
> > tagging looks like?
> > Is it similar to ASAN, where one can get to tag with:
> > u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> >
>
> Yes, it's similar.
>
>  From the libhwasan code, the function to fetch a pointer to the shadow
> memory byte corresponding to a memory address is MemToShadow.
>
> constexpr uptr kShadowScale = 4;
> inline uptr MemToShadow(uptr untagged_addr) {
>    return (untagged_addr >> kShadowScale) +
>           __hwasan_shadow_memory_dynamic_address;
> }
>
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42
>
>
> > 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI 
> > of libsanitizer
> >
>
> Yes, the size of these values define an ABI.
>
> Those particular hooks are added as a demonstration for how something
> like MTE would be implemented on top of this framework (where the
> backend would specify the tag and granule size to match their targets
> architecture).
>
> HWASAN itself would use the hard-coded tag and granule size that matches
> what libsanitizer uses.
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36
>
> I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in
> asan.h, and when using the sanitizer library the macro
> `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.
>
>
> >>
> >> The current patch series is far from complete, but I'm posting the current 
> >> state
> >> to provide something to discuss at the Cauldron next week.
> >>
> >> In its current state, this sanitizer only works on AArch64 with a custom 
> >> kernel
> >> to allow tagged pointers in system calls.  This is discussed in the below 
> >> link
> >> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
> >> allows
> >> tagged pointers in syscalls.
> >
> > Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> > starting from which version?
>
> I find I can only make complicated statements remotely clear in bullet
> points ;-)
>
> What I was trying to say was:
> - HWASAN from this patch series requires AArch64 TBI.
>    (I have not handled architectures without TBI)
> - The upstream kernel does not accept tagged pointers in syscalls.
>    (programs that use TBI must currently clear tags before passing
>     pointers to the kernel)
> - This patch series doesn't include any way to avoid passing tagged
>    pointers to syscalls.
> - Hence on order to test the sanitizer I'm using a kernel that has been
>    patched to accept tagged pointers in many syscalls.
> - The link to the android.com site is just another source describing the
>    same requirement.
>
>
> The support for the relaxed ABI (of accepting tagged pointers in various
> syscalls in the kernel) is being discussed on the kernel mailing list,
> the latest patchset I know of is here:
> https://lkml.org/lkml/2019/7/25/725
>
> I wasn't trying to say anything about MTE in that paragraph, but kernel
> support for MTE is not in upstream linux kernel and is currently being
> worked on.
>
> >
> >> I have also not yet put tests into the DejaGNU framework, but instead have 
> >> a
> >> simple test file from which the tests will eventually come.  That test 
> >> file is
> >> attached to this email despite not being in the patch series.
> >>
> >> Something close to this patch series bootstraps and passes most regression
> >> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The 
> >> regressions it
> >> doesn't pass are all the other sanitizer tests and all linker plugin tests.
> >> The linker plugin tests fail due to a configuration problem where the 
> >> library
> >> path is not correctly set.
> >> (I say "something close to this patch series" because I recently made a 
> >> change
> >> that breaks bootstrap but I believe is the best approach once I've fixed 
> >> it,
> >> hence for an RFC I'm leaving it in).
> >>
> >> HWASAN works by storing a tag in the top bits of every pointer and a 
> >> colour in
> >> a shadow memory region corresponding to every area of memory.  On every 
> >> memory
> >> access through a pointer the tag in the pointer is checked against the 
> >> colour in
> >> shadow memory corresponding to the memory the pointer is accessing.  If 
> >> the tag
> >> and colour do not match then a fault is signalled.
> >>
> >> The instrumentation required for this sanitizer has a large overlap with 
> >> the
> >> instrumentation required for implementing MTE (which has similar 
> >> functionality
> >> but checks are automatically done in the hardware and instructions for 
> >> colouring
> >> shadow memory and for managing tags are provided by the architecture).
> >> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
> >>
> >> We hope to use the HWASAN framework to implement MTE tagging on the stack, 
> >> and
> >> hence I have a "dummy" patch demonstrating the approach envisaged for this.
> >
> > What's the situation with heap allocated memory and global variables?
>
> For the heap, whatever library function allocates memory should return a
> tagged pointer and colour the shadow memory accordingly.  This pointer
> can then be treated exactly the same as all other pointers in
> instrumented code.
> On freeing of memory the shadow memory is uncoloured in order to detect
> use-after-free.
>
> For HWASAN this means malloc and friends need to be intercepted, and
> this is done by the runtime library.
>
> For MTE there will need to be some updates in the system libraries.
> A discussion on the way this will be done in glibc has been started here:
> https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html
>
>
>
> Global variables are untagged.
>
> For MTE we are planning on having these untagged.
> This is in order to allow uninstrumented object files to be statically
> linked into MTE aware object files.
> Since global object accesses are directly generated into the code, there
> would be no way to tag global objects and still use the code from that
> static object.
>
>
> Since global objects will not be coloured for MTE, I am not planning on
> colouring them for HWASAN.  There would be a reasonable amount of work,
> including a new mechanism for associating objects with tags.
>
> Having all global variables untagged means that nothing need be done,
> all pointers to global variables will have a tag of zero and the shadow
> memory will correspondingly be left coloured as zero.
>
> >
> >>
> >> Though there is still much to implement here, the general approach should 
> >> be
> >> clear.  Any feedback is welcomed, but I have three main points that I'm
> >> particularly hoping for external opinions.
> >>
> >> 1) The current approach stores a tag on the RTL representing a given 
> >> variable,
> >>     in order to implement HWASAN for x86_64 the tag needs to be removed 
> >> before
> >>     every memory access but not on things like function calls.
> >>     Is there any obvious way to handle removing the tag in these places?
> >>     Maybe something with legitimize_address?
> >
> > Not being a target expect, but I bet you'll need to store the tag with a RTL
> > representation of a stack variable.
> >
> > Thanks,
> > Martin
> >
> >> 2) The first draft presented here introduces a new RTL expression called
> >>     ADDTAG.  I now believe that a hook would be neater here but haven't yet
> >>     looked into it.  Do people agree?
> >>     (addtag is introduced in the patch titled "Put tags into each stack 
> >> variable
> >>     pointer", but the reason it's introduced is so the backend can define 
> >> how
> >>     this gets implemented with a ~define_expand~ and that's only needed 
> >> for the
> >>     MTE handling as introduced in "Add in MTE stubs")
> >> 3) This patch series has not yet had much thought go towards it around 
> >> command
> >>     line arguments.  I personally quite like the idea of having
> >>     ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow 
> >> memory
> >>     colour", and MTE being just a hardware acceleration of this ability.
> >>     I suspect this idea wouldn't be liked by all and would like to hear 
> >> some
> >>     opinions.
> >>
> >> Thanks,
> >> Matthew
> >>
> >
>

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

Reply via email to