Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

Matthew Malcomson Mon, 09 Sep 2019 08:55:31 -0700

On 09/09/19 11:47, Martin Liška wrote:
> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>> Hello,
>>
>> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
>> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
>> found
>> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> 
> Hello.
> 
> I'm happy that you are working on the functionality for GCC and I can provide
> my knowledge that I have with ASAN. I briefly read the patch series and I have
> multiple questions (and observations):
> 
> 1) Is the ambition of the patchset to be a software emulation of MTE that can
>     work targets that do not support MTE? Is it something what clang
>     names hwasan-abi=interceptor?


The ambition is to provide a software emulation of MTE for AArch64 
targets that don't support MTE.
I also hope to have the framework set up so that enabling for other 
architectures is relatively easy and can be done by those interested.

As I understand it, `hwasan-abi=interceptor` vs `platform` is about 
adding such MTE emulation for "application code" or "platform code (e.g. 
kernel)" respectively.

> 
> 2) Do you have a real aarch64 hardware that has MTE support? Would it be 
> possible
>     for the future to give such a machine to GCC Compile Farm for testing 
> purpose?

No our team doesn't have real MTE hardware, I have been testing on an 
AArch64 machine that has TBI, other work in the team that requires MTE 
support is being tested on the Arm "Fast Models" emulator.

> 
> 3) I like the idea of sharing of internal functions like 
> ASAN_CHECK/HWASAN_CHECK.
>     We should benefit from that in the future.
> 
> 4) Am I correct that due to escape of "tagged" pointers, one needs to have an 
> entire
> DSO (dynamic shared object) built with hwasan enabled? Otherwise, a 
> dereference of
> a tagged pointer will lead to a segfault (except TBI feature on aarch64)?


Yes, one needs to take pains to avoid the escape of tagged pointers on 
architectures other than AArch64.

I don't believe that compiling the entire DSO with HWASAN enabled is 
enough, since pointers can be passed across DSO boundaries.
I haven't yet looked into how to handle this.

There's an even more fundamental problem of accesses within the 
instrumented binary -- I haven't yet figured out how to remove the tag 
before accesses on architectures without the AArch64 TBI feature.


> 
> 5) Is there a documentation/definition of how shadow memory for memory 
> tagging looks like?
> Is it similar to ASAN, where one can get to tag with:
> u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> 

Yes, it's similar.

 From the libhwasan code, the function to fetch a pointer to the shadow 
memory byte corresponding to a memory address is MemToShadow.

constexpr uptr kShadowScale = 4;
inline uptr MemToShadow(uptr untagged_addr) {
   return (untagged_addr >> kShadowScale) +
          __hwasan_shadow_memory_dynamic_address;
}

https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42


> 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of 
> libsanitizer
> 

Yes, the size of these values define an ABI.

Those particular hooks are added as a demonstration for how something 
like MTE would be implemented on top of this framework (where the 
backend would specify the tag and granule size to match their targets 
architecture).

HWASAN itself would use the hard-coded tag and granule size that matches 
what libsanitizer uses.
https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36

I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in 
asan.h, and when using the sanitizer library the macro 
`HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.


>>
>> The current patch series is far from complete, but I'm posting the current 
>> state
>> to provide something to discuss at the Cauldron next week.
>>
>> In its current state, this sanitizer only works on AArch64 with a custom 
>> kernel
>> to allow tagged pointers in system calls.  This is discussed in the below 
>> link
>> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
>> allows
>> tagged pointers in syscalls.
> 
> Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> starting from which version?

I find I can only make complicated statements remotely clear in bullet 
points ;-)

What I was trying to say was:
- HWASAN from this patch series requires AArch64 TBI.
   (I have not handled architectures without TBI)
- The upstream kernel does not accept tagged pointers in syscalls.
   (programs that use TBI must currently clear tags before passing
    pointers to the kernel)
- This patch series doesn't include any way to avoid passing tagged
   pointers to syscalls.
- Hence on order to test the sanitizer I'm using a kernel that has been
   patched to accept tagged pointers in many syscalls.
- The link to the android.com site is just another source describing the
   same requirement.


The support for the relaxed ABI (of accepting tagged pointers in various 
syscalls in the kernel) is being discussed on the kernel mailing list, 
the latest patchset I know of is here:
https://lkml.org/lkml/2019/7/25/725

I wasn't trying to say anything about MTE in that paragraph, but kernel 
support for MTE is not in upstream linux kernel and is currently being 
worked on.

> 
>> I have also not yet put tests into the DejaGNU framework, but instead have a
>> simple test file from which the tests will eventually come.  That test file 
>> is
>> attached to this email despite not being in the patch series.
>>
>> Something close to this patch series bootstraps and passes most regression
>> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The regressions 
>> it
>> doesn't pass are all the other sanitizer tests and all linker plugin tests.
>> The linker plugin tests fail due to a configuration problem where the library
>> path is not correctly set.
>> (I say "something close to this patch series" because I recently made a 
>> change
>> that breaks bootstrap but I believe is the best approach once I've fixed it,
>> hence for an RFC I'm leaving it in).
>>
>> HWASAN works by storing a tag in the top bits of every pointer and a colour 
>> in
>> a shadow memory region corresponding to every area of memory.  On every 
>> memory
>> access through a pointer the tag in the pointer is checked against the 
>> colour in
>> shadow memory corresponding to the memory the pointer is accessing.  If the 
>> tag
>> and colour do not match then a fault is signalled.
>>
>> The instrumentation required for this sanitizer has a large overlap with the
>> instrumentation required for implementing MTE (which has similar 
>> functionality
>> but checks are automatically done in the hardware and instructions for 
>> colouring
>> shadow memory and for managing tags are provided by the architecture).
>> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
>>
>> We hope to use the HWASAN framework to implement MTE tagging on the stack, 
>> and
>> hence I have a "dummy" patch demonstrating the approach envisaged for this.
> 
> What's the situation with heap allocated memory and global variables?

For the heap, whatever library function allocates memory should return a 
tagged pointer and colour the shadow memory accordingly.  This pointer 
can then be treated exactly the same as all other pointers in 
instrumented code.
On freeing of memory the shadow memory is uncoloured in order to detect 
use-after-free.

For HWASAN this means malloc and friends need to be intercepted, and 
this is done by the runtime library.

For MTE there will need to be some updates in the system libraries.
A discussion on the way this will be done in glibc has been started here:
https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html



Global variables are untagged.

For MTE we are planning on having these untagged.
This is in order to allow uninstrumented object files to be statically 
linked into MTE aware object files.
Since global object accesses are directly generated into the code, there 
would be no way to tag global objects and still use the code from that 
static object.


Since global objects will not be coloured for MTE, I am not planning on 
colouring them for HWASAN.  There would be a reasonable amount of work, 
including a new mechanism for associating objects with tags.

Having all global variables untagged means that nothing need be done, 
all pointers to global variables will have a tag of zero and the shadow 
memory will correspondingly be left coloured as zero.

> 
>>
>> Though there is still much to implement here, the general approach should be
>> clear.  Any feedback is welcomed, but I have three main points that I'm
>> particularly hoping for external opinions.
>>
>> 1) The current approach stores a tag on the RTL representing a given 
>> variable,
>>     in order to implement HWASAN for x86_64 the tag needs to be removed 
>> before
>>     every memory access but not on things like function calls.
>>     Is there any obvious way to handle removing the tag in these places?
>>     Maybe something with legitimize_address?
> 
> Not being a target expect, but I bet you'll need to store the tag with a RTL
> representation of a stack variable.
> 
> Thanks,
> Martin
> 
>> 2) The first draft presented here introduces a new RTL expression called
>>     ADDTAG.  I now believe that a hook would be neater here but haven't yet
>>     looked into it.  Do people agree?
>>     (addtag is introduced in the patch titled "Put tags into each stack 
>> variable
>>     pointer", but the reason it's introduced is so the backend can define how
>>     this gets implemented with a ~define_expand~ and that's only needed for 
>> the
>>     MTE handling as introduced in "Add in MTE stubs")
>> 3) This patch series has not yet had much thought go towards it around 
>> command
>>     line arguments.  I personally quite like the idea of having
>>     ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow 
>> memory
>>     colour", and MTE being just a hardware acceleration of this ability.
>>     I suspect this idea wouldn't be liked by all and would like to hear some
>>     opinions.
>>
>> Thanks,
>> Matthew
>>
>

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

Reply via email to