Hi,
My kernel version is 32-bit 3.2.0-rc5 and using btrfs-tools 0.19
I was having performance issues with BTRFS with fragmentation and
HDDs, so I decided to switch to an SSD to see if these would go away.
Performance was much better but at times, I would see a "freeze
happen" which I can't really explain. The CPU would spike up to 100%
at times.
I decided to try reproduce this, hough it may or may not be related,
while testing BTFS performance, I encountered this interesting problem
where performance would depend on whether a file is freshly copied
onto a BTRFS filesystem or obtained via COW "children". This is all
happening on a Crucial M4 SSD, so something on the SSD firmware could
be causing the issue but I feel it's related to BTRFS metadata.
Here is the test:
1. Write a fresh large file to the file system called A
2. Make a reflink of A COW copy B
3. Modify a set of random blocks on B
4. Remove A
5. Repeat 2-5 but use newly produced B as new A
Expected results:
Each steps takes equal amount of time to complete on an SSD because
there is no fragmentation involved and the system is in the same state
at #2 because there's always only one file on the filesystem.
I used 1GB file as my source. I repeated tests using different
algorithms for the "write" in step #2 above.
Algorithm 1 (random): Write 8 bytes randomly
Algorithm 2 (fixed): Write first 8 bytes and continue at 50k offsets
Algorithm 3 (incremental): Write first 8 bytes at offset = random
(50k) then continue at 50k offsets
For each test, there were 40k writes total. Algorithm is in the Java code below.
The following is observed with each iteration ONLY when using algorithm #3
1. Over time, the time to modify the file increases
2. Over time, the time to make the reflink copy increases
3. Over time, the time to remove the file increases
4. First few writes take less then normal time to complete.
Data for 1st/5th/10th/15th/20th iteration:
Algorithm 1 and 2:
Always Write:6s
Always Copy: 0.5s
Always Remove: 0.10s
Algorithm 2:
Write: 2/6/9/10/11.5
Copy: 0.5/3/4.5/5.5/6
Remove: 0.1/1/2/2/2
As you can see, things degrade and taper off after the 10th iteration.
This probably has to do with 4k block size being near 50k/10. I don't
think this has to do with SSD garbage collection because I ran these
tests multiple times.
To use this script, cd into an empty directory on a btrfs filesystem
and and run it with "incremental" as argument. You can use other modes
to confirm expected behavior.
Script used to produce the bug:
#!/bin/bash
mode=$1
if [ -z "$mode" ]; then
echo "Usage $0 <incremental|random|fixed>"
exit -1
fi
mode=$1
src=`pwd`/test/src
dst=`pwd`/test/dst
srcfile=$src/test.tar
dstfile=$dst/test.tar
mkdir -p $src
mkdir -p $dst
filesize=100MB
#build a 1GB file from a smaller download. You can tweak filesize and
the loop below for lower bandwidth
if [ ! -f $srcfile ]; then
cd $src
if [ ! -f $srcfile.dl ]; then
wget http://download.thinkbroadband.com/${filesize}.zip
--output-document=$srcfile.dl
fi
rm -rf tarbase
mkdir tarbase
for i in {1..10}; do
cp --reflink=always $srcfile.dl tarbase/$i.dl
done
tar -cvf $srcfile tarbase
rm -rf tarbase
fi
cat <<END > $src/FileTest.java
import java.io.IOException;
import java.io.RandomAccessFile;
public class FileTest {
public static final int BLOCK_SIZE = 50000;
public static final int MAX_ITERATIONS = 40000;
public static void main(String args[]) throws IOException {
String mode = args[0];
RandomAccessFile f = new RandomAccessFile(args[1], "rw");
//int offset = 0;
int i;
int offset = new java.util.Random().nextInt(BLOCK_SIZE); //
initializer ONLY for incremental mode
for (i=0; i < MAX_ITERATIONS; i++) {
try {
int writeOffset;
if (mode.equals("incremental")) {
writeOffset = new
java.util.Random().nextInt(offset + i * BLOCK_SIZE);
} else { // mode.equals random
writeOffset = new
java.util.Random().nextInt(((int)f.length() - 100));
offset = writeOffset; // for reporting it at the end
}
f.seek(writeOffset);
f.writeBytes("DEADBEEF");
} catch (java.io.IOException e) {
System.out.println("EOF");
break;
}
}
System.out.print("Last offset=" + offset);
System.out.println(". Made " + i + " random writes.");
f.close();
}
}
END
cd $src
javac FileTest.java
/usr/bin/time --format 'rm: %E' rm -rf $dst/*
cp --reflink=always $srcfile.dl $dst/1.tst
cd $dst
for i in {1..20}; do
echo -n "$i."
i_plus=`expr $i + 1`
/usr/bin/time --format 'write: %E' java -cp $src FileTest $mode $i.tst
/usr/bin/time --format 'cp: %E' cp --reflink=always $i.tst
$i_plus.tst
/usr/bin/time --format 'rm: %E' rm $i.tst
/usr/bin/time --format 'sync: %E' sync
sleep 1
done
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html