I have been playing around with relocating file data to improve boot time and 
app start-up time (like OpenOffice) on reiser(fs/4).  This is done by 
monitoring the files accessed during boot/start-up then copying these files 
into a single directory with sequential names 0001 0002 ... matching the 
access order.  Finally the new files are hard linked (rename should work too) 
to the same location as the original files.

As I understand it both reiserfs and reiser4 assign keys to items based on the 
file name and the parent directory.  The file system then attempts to match 
block order with key order .  This allows the above trick to work for placing 
files in a specific order next to each other on disk.

I am using readahead-watch on Ubuntu.  This little tool uses inotify to 
monitor all file accesses while it runs.  The accessed files are written to a 
text file by disk order.  I have modified this tool to also write them by 
access time.  I then use a script (ruby) to do the above copy and link using 
the output from readahead-watch.

I have done some tests on my Athlon 2200 laptop running reiserfs.  Hard drive 
is a 40GB Hitachi Travelstar 80GB has a max real Tx of 25MB/s and access time 
of 12ms.

The reiserfs partition size is 36G with 8.9G used.

I used readahead-watch to create a readahead log during boot on Ubuntu Edgy 
much like the default configuration with the "profile" boot option except set 
to record by access time and I manually killed it after the system fully 
booted.  The with this log used for readahead the system booted in 2:15 from 
grub load to usable desktop (auto login) as measured manually by a stop 
watch.  After running the relocate script the boot time with the same 
readahead log was 1:38.  I then reran the readahead-watch during boot set to 
sort by disk order, resulting in a boot time of 1:15.  I booted twice for 
each test to make sure the results were within a few seconds.

I also used bootchart, but this didn't measure Gnome start-up and requires a 
bit of ambition to analyze thoroughly.  But it was evident that running the 
relocate script did increase peek disk throughput from 6MB/s to 13MB/s and 
increased the averate throughput rate.  But most of boot time is still spent 
waiting on the disk.  My relocate script relocated 310Mb of files.  If those 
where perfectly contiguous on disk, this drive should be able to load that in 
under 20s.  Thought I expect only a fraction of that is actually accessed 
during boot.

Using 'filefrag' it is evident that the relocate scripts attempt to relocate 
the file continuously was a bit half assed, but from the boot times it was 
clearly an improvement.

I also used readahead-watch to monitor the accessed files of openoffice writer 
on startup.  The initial cold start time was 17s (about 0.5s variation from 
load to load).  A warm start (start right after its closed) was 3.6s.  The 
results from readahead-watch where filtered through a script to remove all 
files that where open when openoffice wasn't running (using fuser).  Running 
the relocate script on some of the X and gnome libraries broke my system 
nicely until a reboot.  After running the relocate script the cold start time 
became 14s.  When readahead-list is run on the same files relocated before 
starting openoffice the load time was 6.5s.  sudo sh -c "echo 1 
> /proc/sys/vm/drop_caches" was used to ensure the disk was read between 
runs.

Of course, these results are highly dependent on how fragmented the files 
where before and how effectively the relocation worked.  I expect others 
could reproduce speedups but how much will vary.  I did these tests on my 
laptop with a slow hard drive so the results would be more evident.

I also did some test with fresh reiserfs, reiser4, and ext3 on a 100MB 
loopback to see how well the file system would take the hint to order data 
sequentially.  Creating 10 5MB files with sequential names on reiser4 
resulted in one fragment (measured by 'filefrag') for the whole bunch 
probably a disk allocation bitmap, nearly perfect.  reiserfs generally would 
end up with 3-4 fragments for the same test.  And ext3 didn't appear to make 
any real attempt to order the files sequentially on disk.

I have a 29GB reiser4 partition with 21GB used I have been running for a few 
years now (sometime before release).  When I ran the same 10 5MB file test on 
it, the total resulted in 1000+ fragments (didn't bother to count, but it was 
a lot).  But the files where allocated head to tail.  Its a bit scary to 
think the file system can't find a few MB unallocated region on disk.  
Clearly a repacker would be really nice.

Relocating file data to match pre-measured access patterns can clearly make a 
big performance difference.  Reiser(fs/4) provides an easy mechanism to hint 
at disk order which can be used to measurably improve boot/startup times.  
But, I expect more can be done to achieve better results.  This includes 
better measurements of read patterns and better allocation of the data.

I hope to rerun these tests with Reiser4 (maybe 4.1) on the same hardware.  I 
expect with a fresh (not fragmented) Reiser4 partition, the improvements will 
be more pronounced.  But a repacker should allow more reproducible results 
and nearly perfect data placement for boot and app start-up.

I hope with reiser4, relocation, and the new upstart (Ubuntu's sysvinit 
replacement) with good scripts, I will get this system to boot to usable in 
30 seconds.  And slowoffice (aka openoffice)  to load in 6s cold.  Am I 
overoptimistic?

What about a mechanism to explicitly set or hint at item keys?  Maybe someday, 
linux packages could include preferred file order information that a file 
system like reiser4 could use to order the files on disk resulting in fast 
load times without the need for the user to profile the app.

I think there is a lot to be said for measuring access patterns and using that 
to set keys in addition to deducing it from semantics using a fibration 
plug-in.

Thoughts?

--
Quinn Harris

Reply via email to