+Peter Collingbourne +Evgeniy Stepanov (the main developers of HWASAN in LLVM, FYI) Please note that Peter has recently implemented support for globals in LLVM's HWASAN.
--kcc On Mon, Sep 9, 2019 at 8:55 AM Matthew Malcomson <matthew.malcom...@arm.com> wrote: > > On 09/09/19 11:47, Martin Liška wrote: > > On 9/6/19 4:46 PM, Matthew Malcomson wrote: > >> Hello, > >> > >> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware > >> address sanitizer (HWASAN) in GCC. The document describing HWASAN can be > >> found > >> here > >> http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html. > > > > Hello. > > > > I'm happy that you are working on the functionality for GCC and I can > > provide > > my knowledge that I have with ASAN. I briefly read the patch series and I > > have > > multiple questions (and observations): > > > > 1) Is the ambition of the patchset to be a software emulation of MTE that > > can > > work targets that do not support MTE? Is it something what clang > > names hwasan-abi=interceptor? > > The ambition is to provide a software emulation of MTE for AArch64 > targets that don't support MTE. > I also hope to have the framework set up so that enabling for other > architectures is relatively easy and can be done by those interested. > > As I understand it, `hwasan-abi=interceptor` vs `platform` is about > adding such MTE emulation for "application code" or "platform code (e.g. > kernel)" respectively. > > > > > 2) Do you have a real aarch64 hardware that has MTE support? Would it be > > possible > > for the future to give such a machine to GCC Compile Farm for testing > > purpose? > > No our team doesn't have real MTE hardware, I have been testing on an > AArch64 machine that has TBI, other work in the team that requires MTE > support is being tested on the Arm "Fast Models" emulator. > > > > > 3) I like the idea of sharing of internal functions like > > ASAN_CHECK/HWASAN_CHECK. > > We should benefit from that in the future. > > > > 4) Am I correct that due to escape of "tagged" pointers, one needs to have > > an entire > > DSO (dynamic shared object) built with hwasan enabled? Otherwise, a > > dereference of > > a tagged pointer will lead to a segfault (except TBI feature on aarch64)? > > > Yes, one needs to take pains to avoid the escape of tagged pointers on > architectures other than AArch64. > > I don't believe that compiling the entire DSO with HWASAN enabled is > enough, since pointers can be passed across DSO boundaries. > I haven't yet looked into how to handle this. > > There's an even more fundamental problem of accesses within the > instrumented binary -- I haven't yet figured out how to remove the tag > before accesses on architectures without the AArch64 TBI feature. > > > > > > 5) Is there a documentation/definition of how shadow memory for memory > > tagging looks like? > > Is it similar to ASAN, where one can get to tag with: > > u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf? > > > > Yes, it's similar. > > From the libhwasan code, the function to fetch a pointer to the shadow > memory byte corresponding to a memory address is MemToShadow. > > constexpr uptr kShadowScale = 4; > inline uptr MemToShadow(uptr untagged_addr) { > return (untagged_addr >> kShadowScale) + > __hwasan_shadow_memory_dynamic_address; > } > > https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42 > > > > 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI > > of libsanitizer > > > > Yes, the size of these values define an ABI. > > Those particular hooks are added as a demonstration for how something > like MTE would be implemented on top of this framework (where the > backend would specify the tag and granule size to match their targets > architecture). > > HWASAN itself would use the hard-coded tag and granule size that matches > what libsanitizer uses. > https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36 > > I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in > asan.h, and when using the sanitizer library the macro > `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant. > > > >> > >> The current patch series is far from complete, but I'm posting the current > >> state > >> to provide something to discuss at the Cauldron next week. > >> > >> In its current state, this sanitizer only works on AArch64 with a custom > >> kernel > >> to allow tagged pointers in system calls. This is discussed in the below > >> link > >> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel > >> allows > >> tagged pointers in syscalls. > > > > Can you be please more specific. Is the MTE in upstream linux kernel? If so, > > starting from which version? > > I find I can only make complicated statements remotely clear in bullet > points ;-) > > What I was trying to say was: > - HWASAN from this patch series requires AArch64 TBI. > (I have not handled architectures without TBI) > - The upstream kernel does not accept tagged pointers in syscalls. > (programs that use TBI must currently clear tags before passing > pointers to the kernel) > - This patch series doesn't include any way to avoid passing tagged > pointers to syscalls. > - Hence on order to test the sanitizer I'm using a kernel that has been > patched to accept tagged pointers in many syscalls. > - The link to the android.com site is just another source describing the > same requirement. > > > The support for the relaxed ABI (of accepting tagged pointers in various > syscalls in the kernel) is being discussed on the kernel mailing list, > the latest patchset I know of is here: > https://lkml.org/lkml/2019/7/25/725 > > I wasn't trying to say anything about MTE in that paragraph, but kernel > support for MTE is not in upstream linux kernel and is currently being > worked on. > > > > >> I have also not yet put tests into the DejaGNU framework, but instead have > >> a > >> simple test file from which the tests will eventually come. That test > >> file is > >> attached to this email despite not being in the patch series. > >> > >> Something close to this patch series bootstraps and passes most regression > >> tests when ~--with-build-config=bootstrap-hwasan~ is used. The > >> regressions it > >> doesn't pass are all the other sanitizer tests and all linker plugin tests. > >> The linker plugin tests fail due to a configuration problem where the > >> library > >> path is not correctly set. > >> (I say "something close to this patch series" because I recently made a > >> change > >> that breaks bootstrap but I believe is the best approach once I've fixed > >> it, > >> hence for an RFC I'm leaving it in). > >> > >> HWASAN works by storing a tag in the top bits of every pointer and a > >> colour in > >> a shadow memory region corresponding to every area of memory. On every > >> memory > >> access through a pointer the tag in the pointer is checked against the > >> colour in > >> shadow memory corresponding to the memory the pointer is accessing. If > >> the tag > >> and colour do not match then a fault is signalled. > >> > >> The instrumentation required for this sanitizer has a large overlap with > >> the > >> instrumentation required for implementing MTE (which has similar > >> functionality > >> but checks are automatically done in the hardware and instructions for > >> colouring > >> shadow memory and for managing tags are provided by the architecture). > >> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a > >> > >> We hope to use the HWASAN framework to implement MTE tagging on the stack, > >> and > >> hence I have a "dummy" patch demonstrating the approach envisaged for this. > > > > What's the situation with heap allocated memory and global variables? > > For the heap, whatever library function allocates memory should return a > tagged pointer and colour the shadow memory accordingly. This pointer > can then be treated exactly the same as all other pointers in > instrumented code. > On freeing of memory the shadow memory is uncoloured in order to detect > use-after-free. > > For HWASAN this means malloc and friends need to be intercepted, and > this is done by the runtime library. > > For MTE there will need to be some updates in the system libraries. > A discussion on the way this will be done in glibc has been started here: > https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html > > > > Global variables are untagged. > > For MTE we are planning on having these untagged. > This is in order to allow uninstrumented object files to be statically > linked into MTE aware object files. > Since global object accesses are directly generated into the code, there > would be no way to tag global objects and still use the code from that > static object. > > > Since global objects will not be coloured for MTE, I am not planning on > colouring them for HWASAN. There would be a reasonable amount of work, > including a new mechanism for associating objects with tags. > > Having all global variables untagged means that nothing need be done, > all pointers to global variables will have a tag of zero and the shadow > memory will correspondingly be left coloured as zero. > > > > >> > >> Though there is still much to implement here, the general approach should > >> be > >> clear. Any feedback is welcomed, but I have three main points that I'm > >> particularly hoping for external opinions. > >> > >> 1) The current approach stores a tag on the RTL representing a given > >> variable, > >> in order to implement HWASAN for x86_64 the tag needs to be removed > >> before > >> every memory access but not on things like function calls. > >> Is there any obvious way to handle removing the tag in these places? > >> Maybe something with legitimize_address? > > > > Not being a target expect, but I bet you'll need to store the tag with a RTL > > representation of a stack variable. > > > > Thanks, > > Martin > > > >> 2) The first draft presented here introduces a new RTL expression called > >> ADDTAG. I now believe that a hook would be neater here but haven't yet > >> looked into it. Do people agree? > >> (addtag is introduced in the patch titled "Put tags into each stack > >> variable > >> pointer", but the reason it's introduced is so the backend can define > >> how > >> this gets implemented with a ~define_expand~ and that's only needed > >> for the > >> MTE handling as introduced in "Add in MTE stubs") > >> 3) This patch series has not yet had much thought go towards it around > >> command > >> line arguments. I personally quite like the idea of having > >> ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow > >> memory > >> colour", and MTE being just a hardware acceleration of this ability. > >> I suspect this idea wouldn't be liked by all and would like to hear > >> some > >> opinions. > >> > >> Thanks, > >> Matthew > >> > > >