Re: Performance and FS block size

John Haxby Sun, 12 Feb 2006 13:49:42 -0800

Otis Gospodnetic wrote:

I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what 
I'm after (finding a better/faster FS).  What I'm wondering is about different 
block sizes on a single (ext3) FS.
If I understand block sizes correctly, they represent a chunk of data that the 
FS will read in a single read.
- If the block size is 1K, and Lucene needs to read 4K of data, then the disk 
will have to do 4 reads, and will read in a total of 4K.
- If the block size is 4K, and Lucene needs to read 3K of data, then the disk 
will have to do 1 read, and will read a total of 3K, although that will 
actually consume 4K, because that's the size of a block.

That's correct Otis. Applications generally to get best performancewhen they read data in the file system block size (or small multiplesthereof) which for ext2 and ext3 is almost always 4k. It might beinteresting to try making file systems with different block sizes andsee what the effect on performance is and also, perhaps trying largerblock sizes in Lucene, but always keeping Lucene's block size a multipleof the file system block size. For an educated guess, I'd say that4k/4k gives better performance than smaller file system block sizes and8k/4k is not likely to have much of an effect either way.

Does any of this sound right?
I recall Paul Elschot talking about disk reads and disk arm movement, and 
Robert Engels talking about Nio and block sizes, so they might know more about 
this stuff.

It depends very much on the type of disk: 15,000 rpm ultra-scsi 320disks on a 64 bit PCI card will probably be faster than a 4200rpm diskin a laptop :-) Seriously, disk configuration makes a lot ofdifference: striped RAID arrays will give the best I/O performance(given a controller and whatnot that can exploit that). Once you getinto huge amount of I/O there are other, more complex issues that affectperformance.

java.nio has the right features to exploit the I/O subsystem of the OSto good advantage. We haven't done the performance measurements yet,but memory mappied I/O should yield the best performance (as well asfreeing you from worrying about what block size is best). It willalso be interesting to try the different I/O schedulers under Linux: cfqis the default for the 2.6 kernel that Red Hat ships, but I can imaginethe deadline scheduler may give interesting results. As I say, at somestage over the next few months we're likely to be looking at this inmore detail.

The one thing that makes more difference than anything else though islocality of reference; this seems to well understood by the Lucene indexformat and is probably why the performance is generall good!


jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance and FS block size

Reply via email to