Re: [Blinux-rus] E: Ошибка сегментирования ru_t ts на ядре 2.6.32

Michael Pozhidaev Tue, 21 Dec 2010 18:33:12 -0800

Hello, Igor B. Poretsky!

> Абсолютно правы. Специально сейчас удостоверился. И локаль
> устанавливается ru_RU.koi8r для LC_CTYPE.


А-а-а... Так оно ещё и getopt() с setlocale() использует? С этого и надо
было начинать. До текущего момента я полагал, что Вам удалось
извернуться и оторвать все зависимости на glibc. Вам известен способ
отконфигурировать glibc, чтобы сделать его подходящим для всех
существующих ядер?

Кстати, обращаю Ваше внимание, что современные ядра поддерживают несколько
вариантов выполнения системных вызовов. Конкретный вариант выбирается во
время загрузки ядра. Чтобы режимы в ядре совпадали бы с режимом в
приложениях, системные вызовы пропускаются через специальный gate,
доступ к которому осуществляется через механизм динамических
библиотек. В случае статических бинарей, которые не могут пользоваться
этим механизмом,  как понимаю, начинается
рулетка.  Режим совпадёт - клёво, работаем, не совпадёт - самое место
появиться segfault. Таким образом, ныне статический режим бинарей
подходит разве только для приложений с функциональностью не больше, чем
/bin/true.

Короче, Игорь, у меня просьба хотя бы просто пересобрать бинарник, чтобы
он зацепил нечто более адекватное для современных реалий, или
прилинковать статически libdb3, но убрать статический режим, чтобы
бинарник мог нормально общаться с системой и glibc-core, коли оно всё
равно используется.

Падение наблюдалось на достаточно старой машине с мобильным Celeron
1033MHz. Детали по части системных вызовов:

http://www.trilithium.com/johan/2005/08/linux-gate/

What is linux-gate.so.1?

When you use the ldd utility on a reasonably recent Linux system you'll 
frequently see a
reference to an ethereal entity known as linux-gate.so.1:

ldd /bin/sh
        linux-gate.so.1 =>  (0xffffe000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7fb2000)
        libc.so.6 => /lib/libc.so.6 (0xb7e7c000)
        /lib/ld-linux.so.2 (0xb7fba000)

What's so strange about that? It's just a dynamically loaded library, right?

Sort of, for sufficiently generous definitions of dynamically loaded library. 
The lack of file
name in the output indicates that ldd was unable to locate a file by that name. 
Indeed, any
attempt to find the corresponding file – whether manually or by software 
designed to
automatically load and analyze such libraries – will be unsuccessful.

From time to time this is a cause of befuddlement and frustration for users as 
they go
searching for a non-existent system file. You can confidently tell users on 
this futile quest
that there's not supposed to be a linux-gate.so.1 file present anywhere on the 
file system;
it's a virtual DSO, a shared object exposed by the kernel at a fixed address in 
every process'
memory:

cat /proc/self/maps
08048000-0804c000 r-xp 00000000 08:03 7971106    /bin/cat
0804c000-0804d000 rwxp 00003000 08:03 7971106    /bin/cat
0804d000-0806e000 rwxp 0804d000 00:00 0          [heap]
b7e88000-b7e89000 rwxp b7e88000 00:00 0 
b7e89000-b7fb8000 r-xp 00000000 08:03 8856588    /lib/libc-2.3.5.so
b7fb8000-b7fb9000 r-xp 0012e000 08:03 8856588    /lib/libc-2.3.5.so
b7fb9000-b7fbc000 rwxp 0012f000 08:03 8856588    /lib/libc-2.3.5.so
b7fbc000-b7fbe000 rwxp b7fbc000 00:00 0 
b7fc2000-b7fd9000 r-xp 00000000 08:03 8856915    /lib/ld-2.3.5.so
b7fd9000-b7fdb000 rwxp 00016000 08:03 8856915    /lib/ld-2.3.5.so
bfac3000-bfad9000 rw-p bfac3000 00:00 0          [stack]
ffffe000-fffff000 ---p 00000000 00:00 0          [vdso]

Here cat prints its own memory map. The line marked [vdso] is the 
linux-gate.so.1 object in
that process, a single memory page mapped at address ffffe000. A program can 
determine the
location of the shared object in memory by examining an AT_SYSINFO entry in the 
ELF auxiliary
vector. The auxiliary vector (auxv) is an array of pointers passed to new 
processes in the same
way program arguments (argv) and environment variables (envp) are.

In theory the address could differ between processes, but as far as I know the 
Linux kernel
always maps it at a fixed location. The sample output above come from an x86 
box where
processes live in plain old 32-bit address spaces divided into pages of 4096 
bytes, making
ffffe000 the penultimate page. The very last page is reserved to catch accesses 
through invalid
pointers, e.g. dereferencing a decremented NULL pointer or a MAP_FAILED pointer 
returned from
mmap.

Since all processes share the same object at the same location, it's easy to 
extract a copy of
it if we want to take a closer look at it. For example, we can simply ask dd to 
dump the page
from its own memory (carefully choosing an output name different from 
linux-gate.so.1 to avoid
creating a file that's not supposed to exist):

dd if=/proc/self/mem of=linux-gate.dso bs=4096 skip=1048574 count=1
1+0 records in
1+0 records out

We skip 1048574 because there are 2^20 = 1048576 pages in total and we want to 
extract the next
to last page. The result looks like any other shared ELF object file:

file -b linux-gate.dso
ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped

objdump -T linux-gate.dso

linux-gate.dso:     file format elf32-i386

DYNAMIC SYMBOL TABLE:
ffffe400 l    d  .text  00000000              
ffffe460 l    d  .eh_frame_hdr  00000000              
ffffe484 l    d  .eh_frame      00000000              
ffffe608 l    d  .useless       00000000              
ffffe400 g    DF .text  00000014  LINUX_2.5   __kernel_vsyscall
00000000 g    DO *ABS*  00000000  LINUX_2.5   LINUX_2.5
ffffe440 g    DF .text  00000007  LINUX_2.5   __kernel_rt_sigreturn
ffffe420 g    DF .text  00000008  LINUX_2.5   __kernel_sigreturn

These symbols are entry points for the rt_sigreturn/sigreturn functions and for 
making virtual
system calls. On the x86 platform linux-gate.so.1 was initially called 
linux-vsyscall.so.1, but
this was changed during development to get a common name accurately reflecting 
its purpose
across platforms: to act as a gateway between user and kernel space. Not all 
platforms need
virtual syscalls, but they must be fairly important for x86 to warrant this 
elaborate
mechanism.

Traditionally, x86 system calls have been done with interrupts. You may 
remember that the way
to request operating system functions was via interrupt 33 (21h) back in the 
bad old MS-DOS
days. Windows system calls are buried beneath layers of user-mode APIs, but at 
some point they
too boil down to int 0x2e. Similarly, syscall implementations in Linux and 
other *nix kernels
have been using int 0x80.

It turns out, though, that system calls invoked via interrupts are remarkably 
slow on the more
recent members of the x86 processor family. An int 0x80 system call can be as 
much as an order
of magnitude slower on a 2 GHz Pentium 4 than on an 850 MHz Pentium III. The 
impact on
performance resulting from this could easily be significant, at least for 
applications that do
a lot of system calls.

Intel recognized this problem early on and introduced a more efficient system 
call interface in
the form of sysenter and sysexit instructions. This fast system call feature 
first appeared in
the Pentium Pro processor, but due to hardware bugs it's actually broken in 
most of the early
CPUs. That's why you may see claims that sysenter was introduced with Pentium 
II or even
Pentium III.

The hardware problems also help explain why it took quite some time before 
operating systems
started supporting fast system calls. If we ignore earlier experimental 
patches, Linux support
for sysenter appeared in December 2002 during kernel 2.5 development. That's 
ten years after
the instruction was defined! Microsoft started using sysenter only slightly 
earlier, in Windows
XP.

You can find out if your Linux machine is using the sysenter instruction for 
system calls by
disassembling __kernel_vsyscall:

objdump -d --start-address=0xffffe400 --stop-address=0xffffe414 linux-gate.dso

linux-gate.dso:     file format elf32-i386

Disassembly of section .text:

ffffe400 <__kernel_vsyscall>:
ffffe400:       51                      push   %ecx
ffffe401:       52                      push   %edx
ffffe402:       55                      push   %ebp
ffffe403:       89 e5                   mov    %esp,%ebp
ffffe405:       0f 34                   sysenter 
ffffe407:       90                      nop    
ffffe408:       90                      nop    
ffffe409:       90                      nop    
ffffe40a:       90                      nop    
ffffe40b:       90                      nop    
ffffe40c:       90                      nop    
ffffe40d:       90                      nop    
ffffe40e:       eb f3                   jmp    ffffe403 <__kernel_vsyscall+0x3>
ffffe410:       5d                      pop    %ebp
ffffe411:       5a                      pop    %edx
ffffe412:       59                      pop    %ecx
ffffe413:       c3                      ret    

The preferred way of invoking a system call is determined by the kernel at boot 
time, and
evidently this box uses sysenter. On an older machine you may see int 0x80 
being used instead.
In case you are struggling to make sense of that jump (like I was the first 
time I saw it) you
might be interested to learn that it's there because Linus Torvalds is a 
disgusting pig and
proud of it. (It's a trick to handle restarting of system calls with six 
parameters).

-- 
Michael Pozhidaev. Tomsk, Russia.
Russian info page: http://www.marigostra.ru/
-- 
Blinux-rus mailing list
[email protected]
http://www.a11ywiki.org/cgi-bin/mailman/listinfo/blinux-rus

Re: [Blinux-rus] E: Ошибка сегментирования ru_t ts на ядре 2.6.32

Ответить