Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

Pavel Labath via lldb-commits Thu, 22 Sep 2016 05:58:57 -0700

labath accepted this revision.

================
Comment at: 
source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:513-521
@@ -513,1 +512,11 @@
+
+  // Find out how many bytes we need to watch after 4-byte alignment boundary.
+  uint8_t watch_size = (addr & 0x03) + size;
+
+  // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+  if (size == 0 || watch_size > 4)
+    return LLDB_INVALID_INDEX32;
+
+  // Strip away last two bits of address for byte/half-word/word selection.
+  addr &= ~((lldb::addr_t)3);
 
----------------
omjavaid wrote:
> labath wrote:
> > zturner wrote:
> > > This block of code is a bit confusing to me.  Is this equivalent to:
> > > 
> > > ```
> > > lldb::addr_t start = llvm::alignDown(addr, 4);
> > > lldb::addr_t end = addr + size;
> > > if (start == end || (end-start)>4)
> > >   return LLDB_INVALID_INDEX32;
> > > ```
> > I am not sure this is much clearer, especially, as we will later need a 
> > separate varaible for `end-start` anyway.
> > 
> > +1 for `llvm::alignDown` though.
> There is significant performance difference when we choose addr = addr & 
> (~0x03); over llvm::alignDown
> 
> It will eventually effect responsiveness if we keep increasing code size like 
> this. 
> 
> Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I 
> havnt tried clang though.
> 
> Instructions needed for addr = addr & (~0x03):
>   4008bd:     48 8b 45 e8             mov    -0x18(%rbp),%rax
>   4008c1:     48 83 e0 fc             and    $0xfffffffffffffffc,%rax
>   4008c5:     48 89 45 e8             mov    %rax,-0x18(%rbp)
> 
> Call penalty for llvm::alignDown
> 400918:       ba 00 00 00 00          mov    $0x0,%edx
>   40091d:     48 89 ce                mov    %rcx,%rsi
>   400920:     48 89 c7                mov    %rax,%rdi
>   400923:     e8 ae 00 00 00          callq  4009d6 <_Z9alignDownmmm>
>   400928:     49 89 c4                mov    %rax,%r12
>   40092b:     48 8b 5d d8             mov    -0x28(%rbp),%rbx
> 
> Disassembly for llvm::alignDown
> 00000000004009d6 <_Z9alignDownmmm>:
>   4009d6:     55                      push   %rbp
>   4009d7:     48 89 e5                mov    %rsp,%rbp
>   4009da:     48 89 7d f8             mov    %rdi,-0x8(%rbp)
>   4009de:     48 89 75 f0             mov    %rsi,-0x10(%rbp)
>   4009e2:     48 89 55 e8             mov    %rdx,-0x18(%rbp)
>   4009e6:     48 8b 45 e8             mov    -0x18(%rbp),%rax
>   4009ea:     ba 00 00 00 00          mov    $0x0,%edx
>   4009ef:     48 f7 75 f0             divq   -0x10(%rbp)
>   4009f3:     48 89 55 e8             mov    %rdx,-0x18(%rbp)
>   4009f7:     48 8b 45 f8             mov    -0x8(%rbp),%rax
>   4009fb:     48 2b 45 e8             sub    -0x18(%rbp),%rax
>   4009ff:     ba 00 00 00 00          mov    $0x0,%edx
>   400a04:     48 f7 75 f0             divq   -0x10(%rbp)
>   400a08:     48 0f af 45 f0          imul   -0x10(%rbp),%rax
>   400a0d:     48 89 c2                mov    %rax,%rdx
>   400a10:     48 8b 45 e8             mov    -0x18(%rbp),%rax
>   400a14:     48 01 d0                add    %rdx,%rax
>   400a17:     5d                      pop    %rbp
>   400a18:     c3                      retq   
>   400a19:     0f 1f 80 00 00 00 00    nopl   0x0(%rax)
> 
> Number of instructions generated for alignDown with gcc -Os
> 
>   400892:     48 8b 6c 24 08          mov    0x8(%rsp),%rbp
>   400897:     48 8b 4c 24 10          mov    0x10(%rsp),%rcx
>   40089c:     31 d2                   xor    %edx,%edx
>   40089e:     be c2 0a 40 00          mov    $0x400ac2,%esi
>   4008a3:     bf a0 11 60 00          mov    $0x6011a0,%edi
>   4008a8:     48 89 e8                mov    %rbp,%rax
>   4008ab:     48 f7 f1                div    %rcx
>   4008ae:     48 0f af c1             imul   %rcx,%rax
>   4008b2:     48 89 c3                mov    %rax,%rbx
I tried this with clang-3.6 -O3. The entire alignDown function call compiled 
down to `andq $-4, %rdi`. I'll leave this up to you, as I think it is readable 
enough right now, but I don't think we should be afraid of using utility 
functions like this. I think LLVM cares a lot more about performance than we, 
so I believe we can rely on them knowing what they're doing. Also we have much 
bigger higher-level issues affecting performance, the least of which is the 
change below, where you re-calculate all watchpoints on every resume.


================
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
 
+  uint8_t current_watch_size, new_watch_size;
+  // Calculate overall size width to be watched by current hardware watchpoint 
slot.
----------------
omjavaid wrote:
> labath wrote:
> > Looks much better. Any reason for not using `NextPowerOf2` ? Among other 
> > things, it is self-documenting, so you do not need the comment above that.
> so llvm::NextPowerOf2 doesnt serve the intended behaviour.
> 
> llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.
> 
> We just want to make sure that our new_size is a power of 2 if its already 
> not which will only be the case if size turns out to be 3.
> 
OK, nevermind then.

================
Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202
@@ +201,3 @@
+  // Invalidate watchpoint index map to re-sync watchpoint registers.
+  m_watchpoint_index_map.clear();
+
----------------
omjavaid wrote:
> labath wrote:
> > If you add this, then the comment below becomes obsolete.
> > 
> > Seems like a pretty elegant solution to the incremental watchpoint update 
> > problem. I am wondering whether we need to do it on every resume though. I 
> > think it should be enough to do it when a watchpoint gets deleted 
> > (`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of 
> > `NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer 
> > necessary, and it's not even correct anymore.
> This can be improved for performance I intentionally didn't do it to minimize 
> changes to the generic component. 
I don't care about the performance too much - the code isn't that hot.

What bothers me is that this leaves the code in an inconsistent state - 
NativeRegisterContextLinux_arm::ClearHardwareWatchpoint thinks it is doing the 
watchpoint removal "the old way" whereas what it does actually does not matter, 
as we will nuke the watchpoint registers anyways. Ideally, I'd like to see this 
done in the opposite order - first switching the code to use the "nuking" 
approach to removing watchpoints, and after that adding the slot reuse code (at 
which point it will not need to touch any generic code at all). If you want to 
do it the other way then go ahead, but I do expect to see a follow-up change to 
clean this up.


https://reviews.llvm.org/D24610



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] D24610: LLDB Arm Watchpoints: Use single hardware watchpoint slot to watch multiple bytes where possible

Reply via email to