Otis Gospodnetic wrote:

I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what 
I'm after (finding a better/faster FS).  What I'm wondering is about different 
block sizes on a single (ext3) FS.
If I understand block sizes correctly, they represent a chunk of data that the 
FS will read in a single read.
- If the block size is 1K, and Lucene needs to read 4K of data, then the disk 
will have to do 4 reads, and will read in a total of 4K.
- If the block size is 4K, and Lucene needs to read 3K of data, then the disk 
will have to do 1 read, and will read a total of 3K, although that will 
actually consume 4K, because that's the size of a block.
That's correct Otis. Applications generally to get best performance when they read data in the file system block size (or small multiples thereof) which for ext2 and ext3 is almost always 4k. It might be interesting to try making file systems with different block sizes and see what the effect on performance is and also, perhaps trying larger block sizes in Lucene, but always keeping Lucene's block size a multiple of the file system block size. For an educated guess, I'd say that 4k/4k gives better performance than smaller file system block sizes and 8k/4k is not likely to have much of an effect either way.

Does any of this sound right?
I recall Paul Elschot talking about disk reads and disk arm movement, and 
Robert Engels talking about Nio and block sizes, so they might know more about 
this stuff.
It depends very much on the type of disk: 15,000 rpm ultra-scsi 320 disks on a 64 bit PCI card will probably be faster than a 4200rpm disk in a laptop :-) Seriously, disk configuration makes a lot of difference: striped RAID arrays will give the best I/O performance (given a controller and whatnot that can exploit that). Once you get into huge amount of I/O there are other, more complex issues that affect performance.

java.nio has the right features to exploit the I/O subsystem of the OS to good advantage. We haven't done the performance measurements yet, but memory mappied I/O should yield the best performance (as well as freeing you from worrying about what block size is best). It will also be interesting to try the different I/O schedulers under Linux: cfq is the default for the 2.6 kernel that Red Hat ships, but I can imagine the deadline scheduler may give interesting results. As I say, at some stage over the next few months we're likely to be looking at this in more detail.

The one thing that makes more difference than anything else though is locality of reference; this seems to well understood by the Lucene index format and is probably why the performance is generall good!

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to