One important thing I forgot to mention in my previous email (although I 
thought I had done so) is that I am using LLDB to execute the target in 
single-step mode, thus I am already incurring the 1000x slowdown. Given that, 
the extra processing comes practically for free.

In addition, while I currently focus on Darwin on x86-64, I would prefer to 
make decisions that lead to a cross-{architecture, language, platform} 
solution, ideally without affecting the binary.

Regarding your mmap() interception suggestion, I had also considered it, but 
thought that it would require a kernel driver for handling the page faults of 
the process in order to function properly, since LD_PRELOAD / 
DYLD_INSERT_LIBRARIES wouldn’t work for programs that use syscalls directly or 
statically link with libc.

I believe that the initial solution, aka using "image lookup" and "memory 
region $sp", would better fulfil my current requirements, so I am going to give 
that a try.

Last but not least, I would like to mention that I’ve found your insights 
extremely helpful and really appreciated your willingness to help me, so thank 
you one more time! 😊


― Vangelis


> On 7 Feb 2020, at 19:39, Pavel Labath <lab...@google.com> wrote:
> 
> Thanks for the explanation, Vangelis.
> 
> It sounds like binary instrumentation would be the best approach for this, as 
> this is pretty much exactly what msan does. If recompilation is not an 
> option, then you might be able to get something to work via lldb, but I 
> expect this to be _incredibly_ slow (like 1000x, or more). One thing I might 
> consider in your place is some kind of a in-process solution. For instance, 
> if you intercept mmap (via LD_PRELOAD or something) then you could set it map 
> all anonymous memory (aka heap) as read-only. This way you'll get a SIGSEGV 
> everytime somebody tries to write to that address. You could intercept that 
> signal and do your analysis there. Assuming heap writes are not very common, 
> this might even give you a reasonable performance.
> 
> But this is not going to be super easy either. The trickiest part here will 
> be resuming the program -- you'll need to remap the page read-write, do a 
> single step, and then set it to read-only again.
> 
> pl
> 
> On Fri, 7 Feb 2020 at 01:40, Vangelis Tsiatsianas <vangeli...@icloud.com 
> <mailto:vangeli...@icloud.com>> wrote:
> Thank you for your thorough and timely response, Pavel! 🙂
> 
> Your suggestions might actually cover completely what I am attempting to 
> achieve. 
> 
> Unfortunately, I am not able to disclose the exact reason I need it, but I 
> want to track all heap writes, in order to detect modifications in the heap 
> and save both the old and the newly written value.
> 
> For now, this translates to tracking common x86 assembly instructions (mov{l, 
> w, d, q}) for a single thread ―supporting more “exotic” instructions like 
> SIMD, multiple architectures or threads is not currently a goal.
> 
> Another method could also be an LLVM instrumentation pass, however I would 
> like to avoid recompiling and modifying the binary, thus I focus on LLDB, 
> even if I end up missing a few writes that way.
> 
> I was initially looking for a more complete, cross-platform solution (see: 
> http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html 
> <http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html>), but 
> the solution proved to be too time consuming for the timeframe I have 
> available for my master’s (ending in March).
> 
> 
> ― Vangelis
> 
> 
>> On 7 Feb 2020, at 01:20, Pavel Labath <lab...@google.com 
>> <mailto:lab...@google.com>> wrote:
>> 
>> In general, getting this kind of information is pretty hard, so lldb does 
>> not offer you an out-of-the-box solution for it, but it does give you tools 
>> which you can use to approximate that.
>> 
>> If I wanted to do something like this, the first thing I'd try to do is run 
>> "image lookup -a 0xaddr". If this doesn't return anything then the address 
>> does not correspond to any known module. This rules out code, global 
>> variables, and similar. Then you can run through all of the threads and do a 
>> "memory region $SP", which will give you bounds of the memory allocation 
>> around the stack pointer. If your address is in one of these ranges, then 
>> it's a stack address. Otherwise, it's probably heap (though you can never be 
>> 100% sure of that).
>> 
>> However, it's not fully clear to me what it is that you're trying to do 
>> here. Maybe if you explain the higher level problem that you're trying to 
>> solve, we can come up with a better solution.
>> 
>> pl
>> 
>> On Thu, 6 Feb 2020 at 07:40, Vangelis Tsiatsianas via lldb-dev 
>> <lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>> wrote:
>> Hi everyone,
>> 
>> I am looking for a way to tell whether a memory address belongs to the heap 
>> or not.
>> 
>> In other words, I would like to make sure that the address does not reside 
>> within any stack frame (even if the stack of the thread has been allocated 
>> in the heap) and that it’s not a global variable or instruction.
>> 
>> Checking whether it is a valid or correctly allocated address or a 
>> memory-mapped file or register is not a goal, so accessing it in order to 
>> decide, at the risk of causing a segmentation fault, is an accepted solution.
>> 
>> I have been thinking of manually checking the address against the boundaries 
>> of each active stack frame, the start and end of the instruction segment and 
>> the locations of all global variables.
>> 
>> However, I would like to ask where there are better ways to approach this 
>> problem in LLDB.
>> 
>> Thank you very much, advance! 🙂
>> 
>> 
>> ― Vangelis
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev 
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> 

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to