Re: [gentoo-user] Can't emerge www-client/mozilla-1.7.10-r1 (kernel bug?)

2005-08-03 Thread Zac Medico

Jules Colding wrote:

On Tue, 2005-08-02 at 18:14 +0200, Richard Fish wrote:


Jules Colding wrote:



I tried the aforementioned script just to see if that picked up
anything. Lo and behold... it segfaulted in mkdir. I am beginning to
suspect a subtle reiserfs (mounted with noatime and notail) bug as I am
only seeing segfaults with mkdir and only under high load. There was
something in /var/log/messages as well. Script, output, log and info
below.





Um, what version of reiserfs are we talking about here?  



Whatever is in gentoo-sources-2.6.12-r6. dmesg tells me that it is
format 3.6.



I recall 
reading recently on lkml (or maybe it was somewhere else..) reports that 
reiser4 has known and serious problems on non-x86 platforms, and maybe 
that extends to AMD64 as well



Yes, reiser4 is not quite stable yet.


If you are using the stable reiser3.6 (the only option available in the 
gentoo-sources or vanilla-sources), well, I would be very surprised if 
this was a bug there, because noatime and notail are very common options 
and it is a very popular filesystem.  I would put my money on bad ram or 
memory timings in this case.



I would expect other things to fail too if it was bad RAM or memory
timings, right? The only failure scenario is mkdir under high load which
to me points towards s specific problem area in the code. This is just
an unqualified guess, naturally...



Can you reproduce the problem if you boot from a livecd?  That could help 
settle the question of whether your problems are rooted in hardware or software.

Zac
--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Can't emerge www-client/mozilla-1.7.10-r1 (kernel bug?)

2005-08-03 Thread Jules Colding
On Wed, 2005-08-03 at 00:57 -0700, Zac Medico wrote:
 Jules Colding wrote:
  On Tue, 2005-08-02 at 18:14 +0200, Richard Fish wrote:
  
 Jules Colding wrote:

  I would expect other things to fail too if it was bad RAM or memory
  timings, right? The only failure scenario is mkdir under high load which
  to me points towards s specific problem area in the code. This is just
  an unqualified guess, naturally...
  
 
 Can you reproduce the problem if you boot from a livecd?  That could
 help settle the question of whether your problems are rooted in
 hardware or software.

Would it? I must access the disk and use the RAM, so wouldn't the
results be identical provided that the same kernel is used?

If not, do you have any particular livecd in mind?

-- 
  jules


-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Can't emerge www-client/mozilla-1.7.10-r1 (kernel bug?)

2005-08-02 Thread Jules Colding
On Mon, 2005-08-01 at 17:24 -0700, Zac Medico wrote:

Hi Zac,

 Hi Jules,
 
 Jules Colding wrote:
  Hi,
  
  I can't emerge mozilla-1.7.10-r1. I don't know if this is just me or if
  anyone else is seeing the same, but here is what I got. Output and info
  below.

snip (lots of text)

 What was the solution to the segault that you reported when you tried
 to remerge automake and autoconf?

There are still occasional segfaults during mkdir -p operations in the
mkinstalldirs script when I do make install of various packages. I
have no clue why but re-running make install makes make pass over
where the error was and continue. Very weird indeed...

 If you suspect hardware problems then you should try the memtest
 script mentioned by Francesco in this thread:

I did something like that. I emerged memtest86plus as a boot option and
let it do its thing during the night. It didn't find anything though.

I tried the aforementioned script just to see if that picked up
anything. Lo and behold... it segfaulted in mkdir. I am beginning to
suspect a subtle reiserfs (mounted with noatime and notail) bug as I am
only seeing segfaults with mkdir and only under high load. There was
something in /var/log/messages as well. Script, output, log and info
below.

Regards,
  jules


# memtest.sh #
#!/bin/bash
#
# memtest.sh
#
# Shell script to help isolate memory failures under linux
#
# Author: Doug Ledford  + contributors
#
# (C) Copyright 2000-2002 Doug Ledford; Red Hat, Inc.
# This shell script is released under the terms of the GNU General
# Public License Version 2, June 1991.  If you do not have a copy
# of the GNU General Public License Version 2, then one may be
# retrieved from http://people.redhat.com/dledford/GPL.html
#
# Note, this needs bash2 for the wait command support.

# This is where we will run the tests at
TEST_DIR=/home/colding/tmp

# The location of the linux kernel source file we will be using
if [ -z $SOURCE_FILE ]; then
  SOURCE_FILE=$TEST_DIR/linux.tar.gz
fi

if [ ! -f $SOURCE_FILE ]; then
  echo Missing source file $SOURCE_FILE
  exit 1
fi

# How many passes to run of this test, higher numbers are better
if [ -z $NR_PASSES ]; then
  NR_PASSES=1
fi

# Guess how many megs the unpacked archive is
if [ -z $MEG_PER_COPY ]; then
  MEG_PER_COPY=$(ls -l $SOURCE_FILE | awk '{print int($5/1024/1024) * 4}')
fi

# How many trees do we have to unpack in order to make our trees be larger
# than physical RAM?  If we don't unpack more data than memory can hold
# before we start to run the diff program on the trees then we won't
# actually flush the data to disk and force the system to reread the data
# from disk.  Instead, the system will do everything in RAM.  That doesn't
# work (as far as the memory test is concerned).  It's the simultaneous
# unpacking of data in memory and the read/writes to hard disk via DMA that
# breaks the memory subsystem in most cases.  Doing everything in RAM without
# causing disk I/O will pass bad memory far more often than when you add
# in the disk I/O.
if [ -z $NR_SIMULTANEOUS ]; then
  NR_SIMULTANEOUS=$(free | awk -v meg_per_copy=$MEG_PER_COPY 'NR == 2 {print 
int($2*1.5/1024/meg_per_copy + (($2/1024)%meg_per_copy = (meg_per_copy/2)) + 
(($2/1024/32)  1))}')
fi

# Should we unpack/diff the $NR_SIMULTANEOUS trees in series or in parallel?
if [ ! -z $PARALLEL ]; then
  PARALLEL=yes
else
  PARALLEL=no
fi
PARALLEL=yes

if [ ! -z $JUST_INFO ]; then
  echo TEST_DIR:   $TEST_DIR
  echo SOURCE_FILE:$SOURCE_FILE
  echo NR_PASSES:  $NR_PASSES
  echo MEG_PER_COPY:   $MEG_PER_COPY
  echo NR_SIMULTANEOUS:$NR_SIMULTANEOUS
  echo PARALLEL:   $PARALLEL
  echo
  exit
fi

cd $TEST_DIR

# Remove any possible left over directories from a cancelled previous run
rm -fr linux linux.orig linux.pass.*

# Unpack the one copy of the source tree that we will be comparing against
tar -xzf $SOURCE_FILE
mv linux linux.orig

i=0
while [ $i -lt $NR_PASSES ]; do
  j=0
  while [ $j -lt $NR_SIMULTANEOUS ]; do
if [ $PARALLEL = yes ]; then
  (mkdir $j; tar -xzf $SOURCE_FILE -C $j; mv $j/linux linux.pass.$j; rmdir 
$j) 
else
  tar -xzf $SOURCE_FILE
  mv linux linux.pass.$j
fi
j=`expr $j + 1`
  done
  wait
  j=0
  while [ $j -lt $NR_SIMULTANEOUS ]; do
if [ $PARALLEL = yes ]; then
  (diff -U 3 -rN linux.orig linux.pass.$j; rm -fr linux.pass.$j) 
else
  diff -U 3 -rN linux.orig linux.pass.$j
  rm -fr linux.pass.$j
fi
j=`expr $j + 1`
  done
  wait
  i=`expr $i + 1`
done

# Clean up after ourselves
rm -fr linux linux.orig linux.pass.*


# Complete script output #
./memtest.sh: line 107: 19536 Segmentation fault  mkdir $j
./memtest.sh: line 107: 19553 Segmentation fault  mkdir $j
Inconsistency detected by ld.so: dynamic-link.h: 151: elf_get_dynamic_info: 
Assertion `info[20]-d_un.d_val == 7' failed!
Inconsistency detected by ld.so: dynamic-link.h: 151: elf_get_dynamic_info: 
Assertion