Re: [RFC]confusion about syscall

Peter Teoh Sun, 15 Jul 2012 08:33:46 -0700

I think this is useful:

http://stackoverflow.com/questions/9355097/looking-for-system-calls-implementation-on-linux-kernel


On Sun, Jul 15, 2012 at 11:24 PM, Peter Teoh <[email protected]>wrote:

> just sharing my analysis, correct me if wrong:
>
> On Sun, Jul 15, 2012 at 8:36 PM, 王哲 <[email protected]> wrote:
>
>>
>>
>> 2012/7/15 Peter Teoh <[email protected]>
>>
>>> Hi Mulyadi and WangZhe,
>>>
>>> Nice to write to you again....:-).
>>>
>>> On Sun, Jul 15, 2012 at 1:49 PM, Mulyadi Santosa <
>>> [email protected]> wrote:
>>>
>>>> Hi...
>>>>
>>>> On Sun, Jul 15, 2012 at 9:28 AM, 王哲 <[email protected]> wrote:
>>>> > and the second program:
>>>> >
>>>> > #include <stdio.h>
>>>> > #include <unistd.h>
>>>> >
>>>> > int main(void)
>>>> > {
>>>> >     unsigned long value = 0;
>>>> >     value = getpid();
>>>> >     return 0;
>>>> > }
>>>> >
>>>> > and disassembling it:( objdump -d a.out)
>>>> > ...
>>>> > 08048300 <getpid@plt>:
>>>> >  8048300:    ff 25 00 a0 04 08        jmp    *0x804a000
>>>> >  8048306:    68 00 00 00 00           push   $0x0
>>>> >  804830b:    e9 e0 ff ff ff           jmp    80482f0 <_init+0x3c>
>>>>
>>>> Looks like jumping into vsyscall page to me...
>>>>
>>>>
>>> after I start the process, and doing a gdb -p <pid>:
>>>
>>> (gdb) disassemble main
>>> Dump of assembler code for function main:
>>>    0x0000000000400564 <+0>: push   %rbp
>>>    0x0000000000400565 <+1>: mov    %rsp,%rbp
>>>    0x0000000000400568 <+4>: sub    $0x10,%rsp
>>>    0x000000000040056c <+8>: movq   $0x0,-0x8(%rbp)
>>>    0x0000000000400574 <+16>: mov    $0x0,%eax
>>>    0x0000000000400579 <+21>: callq  0x400460 <getpid@plt>
>>>    0x000000000040057e <+26>: cltq
>>>    0x0000000000400580 <+28>: mov    %rax,-0x8(%rbp)
>>>    0x0000000000400584 <+32>: movabs $0x9184e72a000,%rdi
>>>    0x000000000040058e <+42>: mov    $0x0,%eax
>>>    0x0000000000400593 <+47>: callq  0x400470 <sleep@plt>
>>>    0x0000000000400598 <+52>: mov    $0x0,%eax
>>>    0x000000000040059d <+57>: leaveq
>>>    0x000000000040059e <+58>: retq
>>> End of assembler dump.
>>> (gdb) disassemble getpid
>>> Dump of assembler code for function getpid:
>>>    0x00007f19ae558530 <+0>: mov    %fs:0x2d4,%edx
>>>    0x00007f19ae558538 <+8>: cmp    $0x0,%edx
>>>    0x00007f19ae55853b <+11>: jle    0x7f19ae558540 <getpid+16>
>>>    0x00007f19ae55853d <+13>: mov    %edx,%eax
>>>    0x00007f19ae55853f <+15>: retq
>>>    0x00007f19ae558540 <+16>: jne    0x7f19ae558554 <getpid+36>
>>>    0x00007f19ae558542 <+18>: mov    %fs:0x2d0,%eax
>>>    0x00007f19ae55854a <+26>: test   %eax,%eax
>>>    0x00007f19ae55854c <+28>: nopl   0x0(%rax)
>>>    0x00007f19ae558550 <+32>: je     0x7f19ae558554 <getpid+36>
>>>    0x00007f19ae558552 <+34>: repz retq
>>>    0x00007f19ae558554 <+36>: mov    $0x27,%eax
>>>    0x00007f19ae558559 <+41>: syscall
>>>    0x00007f19ae55855b <+43>: test   %edx,%edx
>>>    0x7f19ae55855d <getpid+45>: jne    0x7f19ae558552 <getpid+34>
>>>    0x7f19ae55855f <getpid+47>: mov    %eax,%fs:0x2d0
>>>    0x7f19ae558567 <getpid+55>: retq
>>>
>>>
>>    Hi peter:
>>        question1: why your system is "0x00007f19ae558554 <+36>: mov
>>  $0x27,%eax",
>> getpid syscall  number is 0x14
>>
>> yes u are right - for 32-bit kernel:
>
> In arch/x86/kernel>
> grep getpid *.S
> syscall_table_32.S: .long sys_getpid /* 20 */
>
> but my linux kernel is 64-bit.
>
>
>
>>        question2: i use gdb disassemble getpid just like you and the
>> result:
>>
>>
>>     (gdb) disassemble getpid
>>  Dump of assembler code for function getpid:
>>    0xb7771a40 <+0>:    mov    %gs:0x6c,%edx
>>    0xb7771a47 <+7>:    cmp    $0x0,%edx
>>    0xb7771a4a <+10>:    jle    0xb7771a50 <getpid+16>
>>    0xb7771a4c <+12>:    mov    %edx,%eax
>>    0xb7771a4e <+14>:    repz ret
>>    0xb7771a50 <+16>:    jne    0xb7771a62 <getpid+34>
>>    0xb7771a52 <+18>:    mov    %gs:0x68,%eax
>>    0xb7771a58 <+24>:    test   %eax,%eax
>>    0xb7771a5a <+26>:    lea    0x0(%esi),%esi
>>    0xb7771a60 <+32>:    jne    0xb7771a4e <getpid+14>
>>    0xb7771a62 <+34>:    mov    $0x14,%eax
>>    0xb7771a67 <+39>:    call   *%gs:0x10
>>
>>
>
> See the comment for gs in entry_32.S:
>
> /*
>  * User gs save/restore
>  *
>  * %gs is used for userland TLS and kernel only uses it for stack
>  * canary which is required to be at %gs:20 by gcc.  Read the comment
>  * at the top of stackprotector.h for more info.
>  *
>  * Local labels 98 and 99 are used.
>  */
> #ifdef CONFIG_X86_32_LAZY_GS
>
> And inside stackprotector.h, content of which is still beyond my
> completely understanding at the moment, I copied it here:
>
> /*
>  * GCC stack protector support.
>  *
>  * Stack protector works by putting predefined pattern at the start of
>  * the stack frame and verifying that it hasn't been overwritten when
>  * returning from the function.  The pattern is called stack canary
>  * and unfortunately gcc requires it to be at a fixed offset from %gs.
>  * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
>  * and x86_32 use segment registers differently and thus handles this
>  * requirement differently.
>  *
>  * On x86_64, %gs is shared by percpu area and stack canary.  All
>  * percpu symbols are zero based and %gs points to the base of percpu
>  * area.  The first occupant of the percpu area is always
>  * irq_stack_union which contains stack_canary at offset 40.  Userland
>  * %gs is always saved and restored on kernel entry and exit using
>  * swapgs, so stack protector doesn't add any complexity there.
>  *
>  * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
>  * used for userland TLS.  Unfortunately, some processors are much
>  * slower at loading segment registers with different value when
>  * entering and leaving the kernel, so the kernel uses %fs for percpu
>  * area and manages %gs lazily so that %gs is switched only when
>  * necessary, usually during task switch.
>  *
>  * As gcc requires the stack canary at %gs:20, %gs can't be managed
>  * lazily if stack protector is enabled, so the kernel saves and
>  * restores userland %gs on kernel entry and exit.  This behavior is
> * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
>  * system.h to hide the details.
>  */
>
> Yes, gs register is valid for userspace TLS and thus is per-process, and
> for more info:
>
> http://www.akkadia.org/drepper/tls.pdf
>
>
> http://www.ibm.com/developerworks/linux/library/l-user-space-apps/index.html
>
>
> http://stackoverflow.com/questions/6021273/how-to-allocate-thread-local-storage
>
> (and lots of relevant links besides it).
>
>
>
>   can you explain the meaning of "call   *%gs:0x10"?
>>
>>   Thanks!
>>
>>
>>
>>
>>> And to check the address space:
>>>
>>> (gdb) info sharedlibrary
>>> From                To                  Syms Read   Shared Object Library
>>> 0x00007f19ae4cb8c0  0x00007f19ae5dec60  Yes (*)     /lib/libc.so.6
>>> 0x00007f19ae830af0  0x00007f19ae849704  Yes (*)
>>> /lib64/ld-linux-x86-64.so.2
>>> (*): Shared library is missing debugging information.
>>>
>>>
>>> and if u want:
>>>
>>> cat /proc/2282/maps
>>>
>>> 7f19ae82a000-7f19ae82b000 rw-p 0017d000 08:05 9922
>>> /lib/libc-2.11.1.so
>>> 7f19ae830000-7f19ae850000 r-xp 00000000 08:05 8824
>>> /lib/ld-2.11.1.so
>>> 7ffff2031000-7ffff2052000 rw-p 00000000 00:00 0
>>>  [stack]
>>> 7ffff21af000-7ffff21b0000 r-xp 00000000 00:00 0
>>>  [vdso]
>>> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
>>>  [vsyscall]
>>>
>>> noticed also that static analysis tools like "objdump -d" is generally
>>> avoided, if u want to understand dynamic addresses.   From above, we can
>>> conclude that the "sysenter" (this is intel syntax, or "syscall", in AMD
>>> syntax as used by gdb disassembly above) is used for the transition to the
>>> kernel - as embedded inside the libc.so.6.
>>>
>>>
>>>> --
>>>> regards,
>>>>
>>>> Mulyadi Santosa
>>>> Freelance Linux trainer and consultant
>>>>
>>>> blog: the-hydra.blogspot.com
>>>> training: mulyaditraining.blogspot.com
>>>>
>>>> _______________________________________________
>>>> Kernelnewbies mailing list
>>>> [email protected]
>>>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Peter Teoh
>>>
>>
>>
>
>
> --
> Regards,
> Peter Teoh
>



-- 
Regards,
Peter Teoh

_______________________________________________
Kernelnewbies mailing list
[email protected]
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Re: [RFC]confusion about syscall

Reply via email to