But that would mean we should be using at least 250k buffers for the IndexInput ? Not the 16k or so that is the default.

Is the OS smart enough to figure out that the file is being sequentially read, and adjust its physical read size to 256k, based on the other concurrent IO operations. Seems this would be hard for it to figure out, and have it not perform poorly in the general case.

On Feb 8, 2008, at 11:25 AM, Doug Cutting wrote:

Michael McCandless wrote:
Merging is far more IO intensive.  With mergeFactor=10, we read from
40 input streams and write to 4 output streams when merging the
tii/tis/frq/prx files.

If your disk can transfer at 50MB/s, and takes 5ms/seek, then 250kB reads and writes are the break-even point, where half the time is spent seeking and half transferring, and throughput is 25MB/s. With 44 files open, that means the OS needs just 11MB of buffering to keep things above this threshold. Since most systems have considerably larger buffer pools than 11MB, merging with mergeFactor=10 shouldn't be seek-bound.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to