Re: Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state

2007-09-29 Thread Chris Snook

Justin Piszcz wrote:

Kernel: 2.6.23-rc8 (older kernels do this as well)

When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
16:10:16:64


It hangs unless I increase various parameters md/raid such as the 
stripe_cache_size etc..


# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root   277  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root  1639  0.0  0.0  0 0 ?D<   12:14   0:00 [xfsbufd]
root  1767  0.0  0.0   8100   420 ?Ds   12:14   0:00 
root  2895  0.0  0.0   5916   632 ?Ds   12:15   0:00 
/sbin/syslogd -r


See the bottom for more details.

Is this normal?  Does md only work without tuning up to a certain stripe 
size? I use a RAID 5 with 1024k stripe which works fine with many 
optimizations, but if I just boot the system and run bonnie++ on it 
without applying the optimizations, it will hang in d-state.  When I run 
the optimizations, then it exits out of D-state, pretty weird?


Not at all.  1024k stripes are way outside the norm.  If you do something way 
outside the norm, and don't tune for it in advance, don't be terribly surprised 
when something like bonnie++ brings your box to its knees.


That's not to say we couldn't make md auto-tune itself more intelligently, but 
this isn't really a bug.  With a sufficiently huge amount of RAM, you'd be able 
to dynamically allocate the buffers that you're not pre-allocating with 
stripe_cache_size, but bonnie++ is eating that up in this case.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state

2007-09-29 Thread Justin Piszcz

Kernel: 2.6.23-rc8 (older kernels do this as well)

When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 16:10:16:64

It hangs unless I increase various parameters md/raid such as the 
stripe_cache_size etc..


# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root   277  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root  1639  0.0  0.0  0 0 ?D<   12:14   0:00 [xfsbufd]
root  1767  0.0  0.0   8100   420 ?Ds   12:14   0:00 
root  2895  0.0  0.0   5916   632 ?Ds   12:15   0:00 /sbin/syslogd -r


See the bottom for more details.

Is this normal?  Does md only work without tuning up to a certain stripe 
size? I use a RAID 5 with 1024k stripe which works fine with many 
optimizations, but if I just boot the system and run bonnie++ on it 
without applying the optimizations, it will hang in d-state.  When I run 
the optimizations, then it exits out of D-state, pretty weird?


(again, without this, bonnie++ will hang in d-state.. until this is run)

Optimization script:

#!/bin/bash

# source profile
. /etc/profile

# Tell user what's going on.
echo "Optimizing RAID Arrays..."

# Define DISKS.
cd /sys/block
DISKS=$(/bin/ls -1d sd[a-z])

# This step must come first.
# See: http://www.3ware.com/KB/article.aspx?id=11050
echo "Setting max_sectors_kb to 128 KiB"
for i in $DISKS
do
  echo "Setting /dev/$i to 128 KiB..."
  echo 128 > /sys/block/"$i"/queue/max_sectors_kb
done

# This step comes next.
echo "Setting nr_requests to 512 KiB"
for i in $DISKS
do
  echo "Setting /dev/$i to 512K KiB"
  echo 512 > /sys/block/"$i"/queue/nr_requests
done

# Set read-ahead.
echo "Setting read-ahead to 64 MiB for /dev/md3"
blockdev --setra 65536 /dev/md3

# Set stripe-cache_size for RAID5.
echo "Setting stripe_cache_size to 16 MiB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size

# Set minimum and maximum raid rebuild speed to 30MB/s.
echo "Setting minimum and maximum resync speed to 30 MiB/s..."
echo 3 > /sys/block/md0/md/sync_speed_min
echo 3 > /sys/block/md0/md/sync_speed_max
echo 3 > /sys/block/md1/md/sync_speed_min
echo 3 > /sys/block/md1/md/sync_speed_max
echo 3 > /sys/block/md2/md/sync_speed_min
echo 3 > /sys/block/md2/md/sync_speed_max
echo 3 > /sys/block/md3/md/sync_speed_min
echo 3 > /sys/block/md3/md/sync_speed_max

# Disable NCQ on all disks.
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
  echo "Disabling NCQ on $i"
  echo 1 > /sys/block/"$i"/device/queue_depth
done

--

Once this runs, everything works fine again.

--

# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
  Creation Time : Wed Aug 22 10:38:53 2007
 Raid Level : raid5
 Array Size : 1318680576 (1257.59 GiB 1350.33 GB)
  Used Dev Size : 146520064 (139.73 GiB 150.04 GB)
   Raid Devices : 10
  Total Devices : 10
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Sat Sep 29 13:05:15 2007
  State : active, resyncing
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 1024K

 Rebuild Status : 8% complete

   UUID : e37a12d1:1b0b989a:083fb634:68e9eb49 (local to host 
p34.internal.lan)
 Events : 0.4211

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   3   8   813  active sync   /dev/sdf1
   4   8   974  active sync   /dev/sdg1
   5   8  1135  active sync   /dev/sdh1
   6   8  1296  active sync   /dev/sdi1
   7   8  1457  active sync   /dev/sdj1
   8   8  1618  active sync   /dev/sdk1
   9   8  1779  active sync   /dev/sdl1

--

NOTE: This bug is reproducible every time:

Example:

$ /usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
16:10:16:64

Writing with putc()...

It writes for 4-5 minutes and then.. SILENCE + D-STATE, I was too late 
this time :(


$ ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  1.2  0.0  0 0 ?D12:50   0:03 [pdflush]
root  2901  0.0  0.0   5916   632 ?Ds   12:50   0:00 /sbin/syslogd -
r
user   4571 48.0  0.0  11644  1084 pts/1D+   12:51   1:55 /usr/sbin/bonn
ie++ -d /x/test -s 16384 -m p34 -n 16:10:16:64
root  4612  1.0  0.0  0 0 ?D12:52   0:01 [pdflush]
root  4624  5.0  0.0  40964  7436 ?D12:55   0:00 /usr/bin/perl -
w /app/rrd-cputemp/bin/rrd_cputemp.pl
root  4684  0.0  0.0  31968  1416 ?  

Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state

2007-09-29 Thread Justin Piszcz

Kernel: 2.6.23-rc8 (older kernels do this as well)

When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 16:10:16:64

It hangs unless I increase various parameters md/raid such as the 
stripe_cache_size etc..


# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root   277  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root  1639  0.0  0.0  0 0 ?D   12:14   0:00 [xfsbufd]
root  1767  0.0  0.0   8100   420 ?Ds   12:14   0:00 
root  2895  0.0  0.0   5916   632 ?Ds   12:15   0:00 /sbin/syslogd -r


See the bottom for more details.

Is this normal?  Does md only work without tuning up to a certain stripe 
size? I use a RAID 5 with 1024k stripe which works fine with many 
optimizations, but if I just boot the system and run bonnie++ on it 
without applying the optimizations, it will hang in d-state.  When I run 
the optimizations, then it exits out of D-state, pretty weird?


(again, without this, bonnie++ will hang in d-state.. until this is run)

Optimization script:

#!/bin/bash

# source profile
. /etc/profile

# Tell user what's going on.
echo Optimizing RAID Arrays...

# Define DISKS.
cd /sys/block
DISKS=$(/bin/ls -1d sd[a-z])

# This step must come first.
# See: http://www.3ware.com/KB/article.aspx?id=11050
echo Setting max_sectors_kb to 128 KiB
for i in $DISKS
do
  echo Setting /dev/$i to 128 KiB...
  echo 128  /sys/block/$i/queue/max_sectors_kb
done

# This step comes next.
echo Setting nr_requests to 512 KiB
for i in $DISKS
do
  echo Setting /dev/$i to 512K KiB
  echo 512  /sys/block/$i/queue/nr_requests
done

# Set read-ahead.
echo Setting read-ahead to 64 MiB for /dev/md3
blockdev --setra 65536 /dev/md3

# Set stripe-cache_size for RAID5.
echo Setting stripe_cache_size to 16 MiB for /dev/md3
echo 16384  /sys/block/md3/md/stripe_cache_size

# Set minimum and maximum raid rebuild speed to 30MB/s.
echo Setting minimum and maximum resync speed to 30 MiB/s...
echo 3  /sys/block/md0/md/sync_speed_min
echo 3  /sys/block/md0/md/sync_speed_max
echo 3  /sys/block/md1/md/sync_speed_min
echo 3  /sys/block/md1/md/sync_speed_max
echo 3  /sys/block/md2/md/sync_speed_min
echo 3  /sys/block/md2/md/sync_speed_max
echo 3  /sys/block/md3/md/sync_speed_min
echo 3  /sys/block/md3/md/sync_speed_max

# Disable NCQ on all disks.
echo Disabling NCQ on all disks...
for i in $DISKS
do
  echo Disabling NCQ on $i
  echo 1  /sys/block/$i/device/queue_depth
done

--

Once this runs, everything works fine again.

--

# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
  Creation Time : Wed Aug 22 10:38:53 2007
 Raid Level : raid5
 Array Size : 1318680576 (1257.59 GiB 1350.33 GB)
  Used Dev Size : 146520064 (139.73 GiB 150.04 GB)
   Raid Devices : 10
  Total Devices : 10
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Sat Sep 29 13:05:15 2007
  State : active, resyncing
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 1024K

 Rebuild Status : 8% complete

   UUID : e37a12d1:1b0b989a:083fb634:68e9eb49 (local to host 
p34.internal.lan)
 Events : 0.4211

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1
   3   8   813  active sync   /dev/sdf1
   4   8   974  active sync   /dev/sdg1
   5   8  1135  active sync   /dev/sdh1
   6   8  1296  active sync   /dev/sdi1
   7   8  1457  active sync   /dev/sdj1
   8   8  1618  active sync   /dev/sdk1
   9   8  1779  active sync   /dev/sdl1

--

NOTE: This bug is reproducible every time:

Example:

$ /usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
16:10:16:64

Writing with putc()...

It writes for 4-5 minutes and then.. SILENCE + D-STATE, I was too late 
this time :(


$ ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  1.2  0.0  0 0 ?D12:50   0:03 [pdflush]
root  2901  0.0  0.0   5916   632 ?Ds   12:50   0:00 /sbin/syslogd -
r
user   4571 48.0  0.0  11644  1084 pts/1D+   12:51   1:55 /usr/sbin/bonn
ie++ -d /x/test -s 16384 -m p34 -n 16:10:16:64
root  4612  1.0  0.0  0 0 ?D12:52   0:01 [pdflush]
root  4624  5.0  0.0  40964  7436 ?D12:55   0:00 /usr/bin/perl -
w /app/rrd-cputemp/bin/rrd_cputemp.pl
root  4684  0.0  0.0  31968  1416 ?D12:55   0:00 /usr/bin/rateup
 

Re: Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state

2007-09-29 Thread Chris Snook

Justin Piszcz wrote:

Kernel: 2.6.23-rc8 (older kernels do this as well)

When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 
16:10:16:64


It hangs unless I increase various parameters md/raid such as the 
stripe_cache_size etc..


# ps auxww | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   276  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root   277  0.0  0.0  0 0 ?D12:14   0:00 [pdflush]
root  1639  0.0  0.0  0 0 ?D   12:14   0:00 [xfsbufd]
root  1767  0.0  0.0   8100   420 ?Ds   12:14   0:00 
root  2895  0.0  0.0   5916   632 ?Ds   12:15   0:00 
/sbin/syslogd -r


See the bottom for more details.

Is this normal?  Does md only work without tuning up to a certain stripe 
size? I use a RAID 5 with 1024k stripe which works fine with many 
optimizations, but if I just boot the system and run bonnie++ on it 
without applying the optimizations, it will hang in d-state.  When I run 
the optimizations, then it exits out of D-state, pretty weird?


Not at all.  1024k stripes are way outside the norm.  If you do something way 
outside the norm, and don't tune for it in advance, don't be terribly surprised 
when something like bonnie++ brings your box to its knees.


That's not to say we couldn't make md auto-tune itself more intelligently, but 
this isn't really a bug.  With a sufficiently huge amount of RAM, you'd be able 
to dynamically allocate the buffers that you're not pre-allocating with 
stripe_cache_size, but bonnie++ is eating that up in this case.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/