Don't want to start a buffer size war, but these have always seemed too small to me. I'd recommend upping both InputStream and OutputStream buffer sizes to at least 4k, as this is the cluster size on most disks these days, and also a common VM page size.
Okay.
Reading and writing in smaller quantities than these is definitely suboptimal.
This is not obvious to me. Can you provide Lucene benchmarks which show this? Modern filesystems have extensive caches, perform read-ahead and delay writes. Thus file-based system calls do not have a close correspondence to physical operations.
To my thinking, the primary role of file buffering in Lucene is to minimize the overhead of the system call itself, not to minimize physical i/o operations. Once the overhead of the system call is made insignificant, larger buffers offer little measurable improvement.
Also, we cannot increase the size of these blindly. Buffers are the largest source of per-query memory allocation in Lucene, with one (or two for phrases and spans) allocated for every query term. Folks whose applications perform wildcard queries have encountered out-of-memory exceptions with the current buffer size.
Possibly one could implement a term wildcard mechanism which does not require a buffer per term, or perhaps one could allocate small buffers for infrequent terms (the vast majority). If such changes were made then it might be feasable to bump up the buffer size somewhat. But, back to my first point, one must first show that larger buffers offer significant performance improvements.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
