I think this is useful: http://stackoverflow.com/questions/9355097/looking-for-system-calls-implementation-on-linux-kernel
On Sun, Jul 15, 2012 at 11:24 PM, Peter Teoh <[email protected]>wrote: > just sharing my analysis, correct me if wrong: > > On Sun, Jul 15, 2012 at 8:36 PM, 王哲 <[email protected]> wrote: > >> >> >> 2012/7/15 Peter Teoh <[email protected]> >> >>> Hi Mulyadi and WangZhe, >>> >>> Nice to write to you again....:-). >>> >>> On Sun, Jul 15, 2012 at 1:49 PM, Mulyadi Santosa < >>> [email protected]> wrote: >>> >>>> Hi... >>>> >>>> On Sun, Jul 15, 2012 at 9:28 AM, 王哲 <[email protected]> wrote: >>>> > and the second program: >>>> > >>>> > #include <stdio.h> >>>> > #include <unistd.h> >>>> > >>>> > int main(void) >>>> > { >>>> > unsigned long value = 0; >>>> > value = getpid(); >>>> > return 0; >>>> > } >>>> > >>>> > and disassembling it:( objdump -d a.out) >>>> > ... >>>> > 08048300 <getpid@plt>: >>>> > 8048300: ff 25 00 a0 04 08 jmp *0x804a000 >>>> > 8048306: 68 00 00 00 00 push $0x0 >>>> > 804830b: e9 e0 ff ff ff jmp 80482f0 <_init+0x3c> >>>> >>>> Looks like jumping into vsyscall page to me... >>>> >>>> >>> after I start the process, and doing a gdb -p <pid>: >>> >>> (gdb) disassemble main >>> Dump of assembler code for function main: >>> 0x0000000000400564 <+0>: push %rbp >>> 0x0000000000400565 <+1>: mov %rsp,%rbp >>> 0x0000000000400568 <+4>: sub $0x10,%rsp >>> 0x000000000040056c <+8>: movq $0x0,-0x8(%rbp) >>> 0x0000000000400574 <+16>: mov $0x0,%eax >>> 0x0000000000400579 <+21>: callq 0x400460 <getpid@plt> >>> 0x000000000040057e <+26>: cltq >>> 0x0000000000400580 <+28>: mov %rax,-0x8(%rbp) >>> 0x0000000000400584 <+32>: movabs $0x9184e72a000,%rdi >>> 0x000000000040058e <+42>: mov $0x0,%eax >>> 0x0000000000400593 <+47>: callq 0x400470 <sleep@plt> >>> 0x0000000000400598 <+52>: mov $0x0,%eax >>> 0x000000000040059d <+57>: leaveq >>> 0x000000000040059e <+58>: retq >>> End of assembler dump. >>> (gdb) disassemble getpid >>> Dump of assembler code for function getpid: >>> 0x00007f19ae558530 <+0>: mov %fs:0x2d4,%edx >>> 0x00007f19ae558538 <+8>: cmp $0x0,%edx >>> 0x00007f19ae55853b <+11>: jle 0x7f19ae558540 <getpid+16> >>> 0x00007f19ae55853d <+13>: mov %edx,%eax >>> 0x00007f19ae55853f <+15>: retq >>> 0x00007f19ae558540 <+16>: jne 0x7f19ae558554 <getpid+36> >>> 0x00007f19ae558542 <+18>: mov %fs:0x2d0,%eax >>> 0x00007f19ae55854a <+26>: test %eax,%eax >>> 0x00007f19ae55854c <+28>: nopl 0x0(%rax) >>> 0x00007f19ae558550 <+32>: je 0x7f19ae558554 <getpid+36> >>> 0x00007f19ae558552 <+34>: repz retq >>> 0x00007f19ae558554 <+36>: mov $0x27,%eax >>> 0x00007f19ae558559 <+41>: syscall >>> 0x00007f19ae55855b <+43>: test %edx,%edx >>> 0x7f19ae55855d <getpid+45>: jne 0x7f19ae558552 <getpid+34> >>> 0x7f19ae55855f <getpid+47>: mov %eax,%fs:0x2d0 >>> 0x7f19ae558567 <getpid+55>: retq >>> >>> >> Hi peter: >> question1: why your system is "0x00007f19ae558554 <+36>: mov >> $0x27,%eax", >> getpid syscall number is 0x14 >> >> yes u are right - for 32-bit kernel: > > In arch/x86/kernel> > grep getpid *.S > syscall_table_32.S: .long sys_getpid /* 20 */ > > but my linux kernel is 64-bit. > > > >> question2: i use gdb disassemble getpid just like you and the >> result: >> >> >> (gdb) disassemble getpid >> Dump of assembler code for function getpid: >> 0xb7771a40 <+0>: mov %gs:0x6c,%edx >> 0xb7771a47 <+7>: cmp $0x0,%edx >> 0xb7771a4a <+10>: jle 0xb7771a50 <getpid+16> >> 0xb7771a4c <+12>: mov %edx,%eax >> 0xb7771a4e <+14>: repz ret >> 0xb7771a50 <+16>: jne 0xb7771a62 <getpid+34> >> 0xb7771a52 <+18>: mov %gs:0x68,%eax >> 0xb7771a58 <+24>: test %eax,%eax >> 0xb7771a5a <+26>: lea 0x0(%esi),%esi >> 0xb7771a60 <+32>: jne 0xb7771a4e <getpid+14> >> 0xb7771a62 <+34>: mov $0x14,%eax >> 0xb7771a67 <+39>: call *%gs:0x10 >> >> > > See the comment for gs in entry_32.S: > > /* > * User gs save/restore > * > * %gs is used for userland TLS and kernel only uses it for stack > * canary which is required to be at %gs:20 by gcc. Read the comment > * at the top of stackprotector.h for more info. > * > * Local labels 98 and 99 are used. > */ > #ifdef CONFIG_X86_32_LAZY_GS > > And inside stackprotector.h, content of which is still beyond my > completely understanding at the moment, I copied it here: > > /* > * GCC stack protector support. > * > * Stack protector works by putting predefined pattern at the start of > * the stack frame and verifying that it hasn't been overwritten when > * returning from the function. The pattern is called stack canary > * and unfortunately gcc requires it to be at a fixed offset from %gs. > * On x86_64, the offset is 40 bytes and on x86_32 20 bytes. x86_64 > * and x86_32 use segment registers differently and thus handles this > * requirement differently. > * > * On x86_64, %gs is shared by percpu area and stack canary. All > * percpu symbols are zero based and %gs points to the base of percpu > * area. The first occupant of the percpu area is always > * irq_stack_union which contains stack_canary at offset 40. Userland > * %gs is always saved and restored on kernel entry and exit using > * swapgs, so stack protector doesn't add any complexity there. > * > * On x86_32, it's slightly more complicated. As in x86_64, %gs is > * used for userland TLS. Unfortunately, some processors are much > * slower at loading segment registers with different value when > * entering and leaving the kernel, so the kernel uses %fs for percpu > * area and manages %gs lazily so that %gs is switched only when > * necessary, usually during task switch. > * > * As gcc requires the stack canary at %gs:20, %gs can't be managed > * lazily if stack protector is enabled, so the kernel saves and > * restores userland %gs on kernel entry and exit. This behavior is > * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in > * system.h to hide the details. > */ > > Yes, gs register is valid for userspace TLS and thus is per-process, and > for more info: > > http://www.akkadia.org/drepper/tls.pdf > > > http://www.ibm.com/developerworks/linux/library/l-user-space-apps/index.html > > > http://stackoverflow.com/questions/6021273/how-to-allocate-thread-local-storage > > (and lots of relevant links besides it). > > > > can you explain the meaning of "call *%gs:0x10"? >> >> Thanks! >> >> >> >> >>> And to check the address space: >>> >>> (gdb) info sharedlibrary >>> From To Syms Read Shared Object Library >>> 0x00007f19ae4cb8c0 0x00007f19ae5dec60 Yes (*) /lib/libc.so.6 >>> 0x00007f19ae830af0 0x00007f19ae849704 Yes (*) >>> /lib64/ld-linux-x86-64.so.2 >>> (*): Shared library is missing debugging information. >>> >>> >>> and if u want: >>> >>> cat /proc/2282/maps >>> >>> 7f19ae82a000-7f19ae82b000 rw-p 0017d000 08:05 9922 >>> /lib/libc-2.11.1.so >>> 7f19ae830000-7f19ae850000 r-xp 00000000 08:05 8824 >>> /lib/ld-2.11.1.so >>> 7ffff2031000-7ffff2052000 rw-p 00000000 00:00 0 >>> [stack] >>> 7ffff21af000-7ffff21b0000 r-xp 00000000 00:00 0 >>> [vdso] >>> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 >>> [vsyscall] >>> >>> noticed also that static analysis tools like "objdump -d" is generally >>> avoided, if u want to understand dynamic addresses. From above, we can >>> conclude that the "sysenter" (this is intel syntax, or "syscall", in AMD >>> syntax as used by gdb disassembly above) is used for the transition to the >>> kernel - as embedded inside the libc.so.6. >>> >>> >>>> -- >>>> regards, >>>> >>>> Mulyadi Santosa >>>> Freelance Linux trainer and consultant >>>> >>>> blog: the-hydra.blogspot.com >>>> training: mulyaditraining.blogspot.com >>>> >>>> _______________________________________________ >>>> Kernelnewbies mailing list >>>> [email protected] >>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>>> >>> >>> >>> >>> -- >>> Regards, >>> Peter Teoh >>> >> >> > > > -- > Regards, > Peter Teoh > -- Regards, Peter Teoh
_______________________________________________ Kernelnewbies mailing list [email protected] http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
