One important thing I forgot to mention in my previous email (although I thought I had done so) is that I am using LLDB to execute the target in single-step mode, thus I am already incurring the 1000x slowdown. Given that, the extra processing comes practically for free.
In addition, while I currently focus on Darwin on x86-64, I would prefer to make decisions that lead to a cross-{architecture, language, platform} solution, ideally without affecting the binary. Regarding your mmap() interception suggestion, I had also considered it, but thought that it would require a kernel driver for handling the page faults of the process in order to function properly, since LD_PRELOAD / DYLD_INSERT_LIBRARIES wouldn’t work for programs that use syscalls directly or statically link with libc. I believe that the initial solution, aka using "image lookup" and "memory region $sp", would better fulfil my current requirements, so I am going to give that a try. Last but not least, I would like to mention that I’ve found your insights extremely helpful and really appreciated your willingness to help me, so thank you one more time! 😊 ― Vangelis > On 7 Feb 2020, at 19:39, Pavel Labath <lab...@google.com> wrote: > > Thanks for the explanation, Vangelis. > > It sounds like binary instrumentation would be the best approach for this, as > this is pretty much exactly what msan does. If recompilation is not an > option, then you might be able to get something to work via lldb, but I > expect this to be _incredibly_ slow (like 1000x, or more). One thing I might > consider in your place is some kind of a in-process solution. For instance, > if you intercept mmap (via LD_PRELOAD or something) then you could set it map > all anonymous memory (aka heap) as read-only. This way you'll get a SIGSEGV > everytime somebody tries to write to that address. You could intercept that > signal and do your analysis there. Assuming heap writes are not very common, > this might even give you a reasonable performance. > > But this is not going to be super easy either. The trickiest part here will > be resuming the program -- you'll need to remap the page read-write, do a > single step, and then set it to read-only again. > > pl > > On Fri, 7 Feb 2020 at 01:40, Vangelis Tsiatsianas <vangeli...@icloud.com > <mailto:vangeli...@icloud.com>> wrote: > Thank you for your thorough and timely response, Pavel! 🙂 > > Your suggestions might actually cover completely what I am attempting to > achieve. > > Unfortunately, I am not able to disclose the exact reason I need it, but I > want to track all heap writes, in order to detect modifications in the heap > and save both the old and the newly written value. > > For now, this translates to tracking common x86 assembly instructions (mov{l, > w, d, q}) for a single thread ―supporting more “exotic” instructions like > SIMD, multiple architectures or threads is not currently a goal. > > Another method could also be an LLVM instrumentation pass, however I would > like to avoid recompiling and modifying the binary, thus I focus on LLDB, > even if I end up missing a few writes that way. > > I was initially looking for a more complete, cross-platform solution (see: > http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html > <http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html>), but > the solution proved to be too time consuming for the timeframe I have > available for my master’s (ending in March). > > > ― Vangelis > > >> On 7 Feb 2020, at 01:20, Pavel Labath <lab...@google.com >> <mailto:lab...@google.com>> wrote: >> >> In general, getting this kind of information is pretty hard, so lldb does >> not offer you an out-of-the-box solution for it, but it does give you tools >> which you can use to approximate that. >> >> If I wanted to do something like this, the first thing I'd try to do is run >> "image lookup -a 0xaddr". If this doesn't return anything then the address >> does not correspond to any known module. This rules out code, global >> variables, and similar. Then you can run through all of the threads and do a >> "memory region $SP", which will give you bounds of the memory allocation >> around the stack pointer. If your address is in one of these ranges, then >> it's a stack address. Otherwise, it's probably heap (though you can never be >> 100% sure of that). >> >> However, it's not fully clear to me what it is that you're trying to do >> here. Maybe if you explain the higher level problem that you're trying to >> solve, we can come up with a better solution. >> >> pl >> >> On Thu, 6 Feb 2020 at 07:40, Vangelis Tsiatsianas via lldb-dev >> <lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>> wrote: >> Hi everyone, >> >> I am looking for a way to tell whether a memory address belongs to the heap >> or not. >> >> In other words, I would like to make sure that the address does not reside >> within any stack frame (even if the stack of the thread has been allocated >> in the heap) and that it’s not a global variable or instruction. >> >> Checking whether it is a valid or correctly allocated address or a >> memory-mapped file or register is not a goal, so accessing it in order to >> decide, at the risk of causing a segmentation fault, is an accepted solution. >> >> I have been thinking of manually checking the address against the boundaries >> of each active stack frame, the start and end of the instruction segment and >> the locations of all global variables. >> >> However, I would like to ask where there are better ways to approach this >> problem in LLDB. >> >> Thank you very much, advance! 🙂 >> >> >> ― Vangelis >> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> <https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev> >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev