In addition to what Jon Haslam already said, "instrumentation" in DTrace 
is many things. Not all DTrace providers enable probes by putting jump or 
trap instructions into application/kernel code at the probepoints. The 
syscall provider is one that doesn't. Neither application nor kernel code 
is "instrumented" when you enable a syscall probe - instead, as Jon 
showed, the kernel's system call dispatch table is modified, with a bounce 
to the dtrace syscall provider in the slots that you're probing.

syscalls in Solaris are in a function call table (sysent). An application, 
making a system call, ends up executing a trap instruction (ta 0x8 on 
SPARC, int/sysenter/syscall/lcall on x86 depending on CPU type) with one 
CPU register containing the system call number (the list of which you 
find in <sys/syscalls.h>). The trap handler simply checks this for 
validity (within the range that's defined) and then gets the function 
pointer to call by indexing that table. This is how syscalls get from 
userland into the kernel - they cause a trap, which is a privilege 
switching event.

Why are you not seeing those trap instructions in your app's code ? 
Because they're in libc only. The app is not allowed to care how exactly a 
system call is done - it calls the libc function, via the procedure 
linkage table that ld.so fills in when loading/linking the app. Try the 
following to see this stuff:

1. Compile + Link your test program
2. load it into mdb but do not run it yet.
3. disassemble main(), find the PLT:... entries
4. put a breakpoint at main (main::bp does it in mdb)
5. run the program
6. when it hits the breakpoint, disassemble it again

You'll find that the PLT:... entries have been replaced by the actual libc 
function entry points. That's the linker's work.

If you disassemble those libc funcs, you'll then find the actual 'syscall' 
instruction (on amd64 it's indeed 'syscall').

Bye,
FrankH.

On Thu, 22 Feb 2007, Peter Boros wrote:

> Hi!
>
> I want to see how the syscall instrumentation work in assembly level, so
> similar to this:
>
>> ufs_write::dis -n 3
> ufs_write:                      save      %sp, -0x110, %sp
> ufs_write+4:                    stx       %i4, [%sp + 0x8bf]
> ufs_write+8:                    mov       %i0, %i5
> ufs_write+0xc:                  ldx       [%i0 + 0x10], %i4
>
>> ufs_write::dis -n 3
> ufs_write:                      ba,a      +0x19814c     <0x14c95dc>
> ufs_write+4:                    stx       %i4, [%sp + 0x8bf]
> ufs_write+8:                    mov       %i0, %i5
> ufs_write+0xc:                  ldx       [%i0 + 0x10], %i4
>
>> ufs_write+0x19814c::dis
> 0x14c95b4:                      sethi     %hi(0x1331000), %g1
> 0x14c95b8:                      call      +0x79ebc0e8   <dtrace_probe>
> 0x14c95bc:                      or        %g1, 0xc8, %o7
> 0x14c95c0:                      sethi     %hi(0x4000), %o0
> 0x14c95c4:                      or        %o0, 0x98, %o0
> 0x14c95c8:                      mov       0x300, %o1
> 0x14c95cc:                      call      +0x79ebc0d4   <dtrace_probe>
> 0x14c95d0:                      mov       %i0, %o2
> 0x14c95d4:                      ret
> 0x14c95d8:                      restore
> ---
> 0x14c95dc:                      save      %sp, -0x110, %sp
> 0x14c95e0:                      sethi     %hi(0x4000), %o0
> 0x14c95e4:                      or        %o0, 0x99, %o0
> 0x14c95e8:                      mov       %i0, %o1
> 0x14c95ec:                      mov       %i1, %o2
> 0x14c95f0:                      mov       %i2, %o3
> 0x14c95f4:                      mov       %i3, %o4
> 0x14c95f8:                      mov       %i4, %o5
> 0x14c95fc:                      sethi     %hi(0x1331400), %g1
> 0x14c9600:                      call      +0x79ebc0a0   <dtrace_probe>
> 0x14c9604:                      or        %g1, 0x8c, %o7
>
> So, to examine this, I wrote a program, which makes a system call:
> #include <unistd.h>
> int main(int argc, char *argv[]) {
>        write(0,"helloworld\n",11);
>        return 0;
> }
>
> So, I start to examing it with mdb:
> mdb ./syscall
>> main:b
>> :r
> mdb: stop at main
> mdb: target stopped at:
> main:           save      %sp, -0x68, %sp
>> .::dis
> main:                           save      %sp, -0x68, %sp
> main+4:                         st        %i0, [%fp + 0x44]
> main+8:                         st        %i1, [%fp + 0x48]
> main+0xc:                       sethi     %hi(0x10c00), %o1
> main+0x10:                      or        %o1, 0x90, %o1
> main+0x14:                      clr       %o0
> main+0x18:                      call      +0x100ac      <PLT:write>
> main+0x1c:                      mov       0xb, %o2
> main+0x20:                      clr       [%fp - 0x4]
> main+0x24:                      clr       %i0
> main+0x28:                      ret
> main+0x2c:                      restore
> main+0x30:                      clr       %i0
> main+0x34:                      ret
> main+0x38:                      restore
>
> Okay, the syscall is there, dtrace instuments it, if I turn on the
> syscall::write:entry probe.
>
> When I try to examing write itself I get the same results in
> instrumented and non-instrumented case (I followed the brances, it is
> the same after that too):
>> main+0x100ac::dis
> PLT:exit:                       sethi     %hi(0xf000), %g1
> PLT:exit:                       ba,a      -0x40         <PLT:>
> PLT:exit:                       nop
> PLT:_exit:                      sethi     %hi(0x12000), %g1
> PLT:_exit:                      ba,a      -0x4c         <PLT:>
> PLT:_exit:                      nop
> PLT:write:                      sethi     %hi(0x15000), %g1
> PLT:write:                      ba,a      -0x58         <PLT:>
> PLT:write:                      nop
> PLT:_get_exit_frame_monitor:    sethi     %hi(0x18000), %g1
> PLT:_get_exit_frame_monitor:    ba,a      -0x64         <PLT:>
>
> I tried to ::step the program through the instrumentation, but when the
> probe is on, it conseqently crashes at one instruction (with this, at
> some point I should run into dtrace_probe).
>
> How can I see the effect of system call instrumentation at assembly
> level? Maybe it would be easier if I could compile a static binary. I am
> using nevada build 56 on sparc.
>
> Peter
>
> _______________________________________________
> mdb-discuss mailing list
> mdb-discuss at opensolaris.org
>

Reply via email to