labath accepted this revision.
@@ -513,1 +512,11 @@
+ // Find out how many bytes we need to watch after 4-byte alignment boundary.
+ uint8_t watch_size = (addr & 0x03) + size;
+ // We cannot watch zero or more than 4 bytes after 4-byte alignment boundary.
+ if (size == 0 || watch_size > 4)
+ return LLDB_INVALID_INDEX32;
+ // Strip away last two bits of address for byte/half-word/word selection.
+ addr &= ~((lldb::addr_t)3);
> labath wrote:
> > zturner wrote:
> > > This block of code is a bit confusing to me. Is this equivalent to:
> > >
> > > ```
> > > lldb::addr_t start = llvm::alignDown(addr, 4);
> > > lldb::addr_t end = addr + size;
> > > if (start == end || (end-start)>4)
> > > return LLDB_INVALID_INDEX32;
> > > ```
> > I am not sure this is much clearer, especially, as we will later need a
> > separate varaible for `end-start` anyway.
> > +1 for `llvm::alignDown` though.
> There is significant performance difference when we choose addr = addr &
> (~0x03); over llvm::alignDown
> It will eventually effect responsiveness if we keep increasing code size like
> Even if we use -Os then alignDown is squeezed down to 8-9 instructions. I
> havnt tried clang though.
> Instructions needed for addr = addr & (~0x03):
> 4008bd: 48 8b 45 e8 mov -0x18(%rbp),%rax
> 4008c1: 48 83 e0 fc and $0xfffffffffffffffc,%rax
> 4008c5: 48 89 45 e8 mov %rax,-0x18(%rbp)
> Call penalty for llvm::alignDown
> 400918: ba 00 00 00 00 mov $0x0,%edx
> 40091d: 48 89 ce mov %rcx,%rsi
> 400920: 48 89 c7 mov %rax,%rdi
> 400923: e8 ae 00 00 00 callq 4009d6 <_Z9alignDownmmm>
> 400928: 49 89 c4 mov %rax,%r12
> 40092b: 48 8b 5d d8 mov -0x28(%rbp),%rbx
> Disassembly for llvm::alignDown
> 00000000004009d6 <_Z9alignDownmmm>:
> 4009d6: 55 push %rbp
> 4009d7: 48 89 e5 mov %rsp,%rbp
> 4009da: 48 89 7d f8 mov %rdi,-0x8(%rbp)
> 4009de: 48 89 75 f0 mov %rsi,-0x10(%rbp)
> 4009e2: 48 89 55 e8 mov %rdx,-0x18(%rbp)
> 4009e6: 48 8b 45 e8 mov -0x18(%rbp),%rax
> 4009ea: ba 00 00 00 00 mov $0x0,%edx
> 4009ef: 48 f7 75 f0 divq -0x10(%rbp)
> 4009f3: 48 89 55 e8 mov %rdx,-0x18(%rbp)
> 4009f7: 48 8b 45 f8 mov -0x8(%rbp),%rax
> 4009fb: 48 2b 45 e8 sub -0x18(%rbp),%rax
> 4009ff: ba 00 00 00 00 mov $0x0,%edx
> 400a04: 48 f7 75 f0 divq -0x10(%rbp)
> 400a08: 48 0f af 45 f0 imul -0x10(%rbp),%rax
> 400a0d: 48 89 c2 mov %rax,%rdx
> 400a10: 48 8b 45 e8 mov -0x18(%rbp),%rax
> 400a14: 48 01 d0 add %rdx,%rax
> 400a17: 5d pop %rbp
> 400a18: c3 retq
> 400a19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> Number of instructions generated for alignDown with gcc -Os
> 400892: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp
> 400897: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx
> 40089c: 31 d2 xor %edx,%edx
> 40089e: be c2 0a 40 00 mov $0x400ac2,%esi
> 4008a3: bf a0 11 60 00 mov $0x6011a0,%edi
> 4008a8: 48 89 e8 mov %rbp,%rax
> 4008ab: 48 f7 f1 div %rcx
> 4008ae: 48 0f af c1 imul %rcx,%rax
> 4008b2: 48 89 c3 mov %rax,%rbx
I tried this with clang-3.6 -O3. The entire alignDown function call compiled
down to `andq $-4, %rdi`. I'll leave this up to you, as I think it is readable
enough right now, but I don't think we should be afraid of using utility
functions like this. I think LLVM cares a lot more about performance than we,
so I believe we can rely on them knowing what they're doing. Also we have much
bigger higher-level issues affecting performance, the least of which is the
change below, where you re-calculate all watchpoints on every resume.
Comment at: source/Plugins/Process/Linux/NativeRegisterContextLinux_arm.cpp:552
@@ -559,1 +551,3 @@
+ uint8_t current_watch_size, new_watch_size;
+ // Calculate overall size width to be watched by current hardware watchpoint
> labath wrote:
> > Looks much better. Any reason for not using `NextPowerOf2` ? Among other
> > things, it is self-documenting, so you do not need the comment above that.
> so llvm::NextPowerOf2 doesnt serve the intended behaviour.
> llvm::NextPowerOf2 returns 2 for 1, 4 for 2 or 3, and 8 for 4.
> We just want to make sure that our new_size is a power of 2 if its already
> not which will only be the case if size turns out to be 3.
OK, nevermind then.
Comment at: source/Plugins/Process/Linux/NativeThreadLinux.cpp:202
@@ +201,3 @@
+ // Invalidate watchpoint index map to re-sync watchpoint registers.
> labath wrote:
> > If you add this, then the comment below becomes obsolete.
> > Seems like a pretty elegant solution to the incremental watchpoint update
> > problem. I am wondering whether we need to do it on every resume though. I
> > think it should be enough to do it when a watchpoint gets deleted
> > (`NTL::RemoveWatchpoint`). Also, we should throw out the implementation of
> > `NativeRegisterContextLinux_arm::ClearHardwareWatchpoint` -- it's no longer
> > necessary, and it's not even correct anymore.
> This can be improved for performance I intentionally didn't do it to minimize
> changes to the generic component.
I don't care about the performance too much - the code isn't that hot.
What bothers me is that this leaves the code in an inconsistent state -
NativeRegisterContextLinux_arm::ClearHardwareWatchpoint thinks it is doing the
watchpoint removal "the old way" whereas what it does actually does not matter,
as we will nuke the watchpoint registers anyways. Ideally, I'd like to see this
done in the opposite order - first switching the code to use the "nuking"
approach to removing watchpoints, and after that adding the slot reuse code (at
which point it will not need to touch any generic code at all). If you want to
do it the other way then go ahead, but I do expect to see a follow-up change to
clean this up.
lldb-commits mailing list