Hi.
On Sat 2003-04-12 at 11:37:15 -0300, [EMAIL PROTECTED] wrote:
> Em S�b, 2003-04-12 �s 01:21, [EMAIL PROTECTED] escreveu:
> > On Fri, 11 Apr 2003, Leonardo [ISO-8859-1] S� wrote:
> >
> > > Why Segmentation fault? This is a brand new Mandrake 9.0 installation
> > > and these commands are the first attempt on this machine to connect to a
> > > mysql server.
> >
> > Try strace'ing the command and sending output to a file. This will give
> > you a clearer picture of what's going wrong. Have you tried reinstalling
> > the package?
> >
>
> I straced the command but the output file was full of terms I couldn't
> understand.Take a look a the tail of the file:
>
> ________________________________________________________________________
> open("/lib/ld-linux.so.2", O_RDONLY) = 3
Here the file /lib/ld-linux.so.2 is opened for reading (got fd=3).
This is the shared library loader ("man execve" goes into that a bit),
we are still in the step where the program is loaded into memory. The
control has not yet been handed over to the new program (it's not even
completely loaded).
So the short story is, there is some fundamental problem. You
shouldn't get a SEGV when system libs are loaded, even if the mysql
binary is corrupted. From what you quoted, it looks that it bails out
very early in the process (you did not get much output on the screeen,
did you?).
Since it is something that should not happen at all (and according to
google also does not happen often) I am not sure what to do about it.
My first blind guess would be to check via md5sum if the installation
CDs are corrupted.
Sorry, that is all help I can offer. :-/
Some further comments, because it seems you were interested in the
meaning of the output by itself. Keep in mind that strace only shows
system calls. There may be an abitrary amount of other code and
function calls in between of each output line. And also keep in mind,
that I am not an OS programmer, but just interpreting what I see here.
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\n\0\000"..., 1024) = 1024
Here the first 1024 bytes (probably to verify the file format is
correct, etc) are read.
> fstat64(3, {st_mode=S_IFREG|0755, st_size=295167, ...}) = 0
stat is called on the file (that is, it requests about the same info
you would see with "ls -l"...). Maybe this is in order to verify
privileges, maybe it's just to know the file size. Whatever.
> old_mmap(NULL, 75412, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40150000
mmap asks for the file (75412 bytes... hm, but I thought it is 295167
long; just verified that via ls) being mapped to memory (it is now
accessible at 0x40150000). Apparently only certain parts need to be
loaded.
Following accesses to the returned memory region are transparently
translated into file reads and write. The main reason to do so for a
program / library is to only have those parts read into memory which
are really accessed. This gives better performance than reading the
whole file by conventional means and relying on the OS to swap out
what is not used.
> mprotect(0x40162000, 1684, PROT_NONE) = 0
Here the end of the previous range (0x40150000 + 75412 = 0x40162694)
is set to be unaccessible. I am not sure why, but the ending range of
the first mmap call is overwritten by the next.
This call in between assures that a SEGV is triggered in the case that
anything touches this dummy memory range. Maybe it is not sure if the
next mmap will be needed. Btw, this has nothing to do with the SEGV
you got, because the next mmap call maps something valid into that
range (with read/write access) again.
> old_mmap(0x40162000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12000)
> = 0x40162000
Now mmap is used again, this time asking for 4096 bytes from position
0x12000 (dec 73728) of the file to be mapped at 0x40162000 (if
possible) for read/write, and gets the request granted. Considering
that 75412-1684 = 73728, the first mmap call would have been
sufficient, except for the fact that the first has PROT_EXEC set, the
second has not, but has PROT_WRITE. So the first seems to be for the
code segment (i.e. the instructions) of the file and the second for
the data segment (i.e. the variables).
> close(3) = 0
The file is closed (AFAICT, it is okay to close the file descriptor of
an mmap'ed file).
> --- SIGSEGV (Segmentation fault) ---
> +++ killed by SIGSEGV +++
And afterwards something went wrong.
> I'm a newbie about Linux programming (i'm still learning C) so I
> need some help to interpret the above lines. And, yes I've tried
> reinstalling the package.
As a reality check (to determine how much my conclusions differ from
the real source), I just checked what the source is.
/lib/ld-linux.so.2 is a link to ld-2.3.2.so on my system and that's
part of the glibc package. Downloaded the source for 9.0
(glibc-2.2.5-16mdk.src.rpm) and had a look-around (in the base source
without all the patches). After searching for mprotect(), I found that
most of the above happens in elf/dl-load.c:_dl_map_object_from_fd().
Ah, the purpose of that stat call is to determine what's the real name
of the file (i.e. whether the file is a link) in order to prevent
loading the same library twice.
I got near with my assumption about the mprotect. From the source
(line 943ff):
/* This is a position-independent shared object. We can let the
kernel map it anywhere it likes, but we must have space for all
the segments in their specified positions relative to the first.
So we map the first segment without MAP_FIXED, but with its
extent increased to cover all the segments. Then we remove
access from excess portion, and there is known sufficient space
there to remap from the later segments.
As a refinement, sometimes we have an address that we would
prefer to map such objects at; but this is only a preference, the
OS can do whatever it likes. */
In other words, the first mmap call maps more of the file than is
needed in order to be quite sure that the following mmap call (with a
preferred position) will succeed. The mprotect in between is for the
case that some of the overallocated range is not reallocated later. Or
in the words of the source (line 972ff):
/* Change protection on the excess portion to disallow all access;
the portions we do not remap later will be inaccessible as if
unallocated. Then jump into the normal segment-mapping loop to
handle the portion of the segment past the end of the file
mapping. */
The function does not necessarily make another system call before it
returns (and I am not familiar enough with it to determine if one of
the conditions which do system calls is usually true for loading a
program like mysql on Linux). So we cannot say that the SEGV has to be
in this functions.
The only caller of _dl_map_object_from_fd is _dl_map_object. The call
to "read" quite certainly comes from a call to open_verify, which
determines - as I guessed - whether the file is an ELF file for the
used architecture.
_dl_map_object has no system calls after the call to
_dl_map_object_from_fd, so we still have not limited the SEGV to a
certain code part. _dl_map_object has more callers than I am willing
to follow.
Sorry for all that noise, but I think the expert list can stand such
in-depth analysis from time to time. ;-)
HTH,
Benjamin.
pgp00000.pgp
Description: PGP signature
