Oh, it certainly causes some random access--I don't deny that. I
just want to emphasize that this isn't at all the same as all "random
writes", which would be expected to perform an order-mag slower.
Just did a test where I wrote out a 1gig file in 1K chunks. Then
wrote it out in 2files, alternating 512 byte chunks, then 4 files/
256 byte chunks. Some speed is lost--perhaps 10% at each doubling--
but the speed is still essentially "sequential" speed. You can get
back the original performance by using consistent sized chunks (1K to
each file round-robin).
HDD controllers are actually quite good at batching writes into
sequentially. Why else do you think sync() takes to long :)
-Mike
On 7-Feb-08, at 3:35 PM, robert engels wrote:
I don't think that is true - but I'm probably wrong though :).
My understanding is that several files are written in parallel
(during the merge), causing random access. After the files are
written, then they are all reread and written as a CFS file
(essential sequential - although the read and write is going to
cause head movement).
The code:
private IndexOutput tvx, tvf, tvd; // To write term
vectors
private FieldsWriter fieldsWriter;
is my clue that several files are written at once.
On Feb 7, 2008, at 5:19 PM, Mike Klaas wrote:
On 7-Feb-08, at 2:00 PM, robert engels wrote:
My point is that commit needs to be used in most applications,
and the commit in Lucene is very slow.
You don't have 2x the IO cost, mainly because only the log file
needs to be sync'd. The index only has to be sync'd eventually,
in order to prune the logfile - this can be done in the
background, improving the performance of update and commit cycle.
Also, writing the log file is very efficiently because it is an
append/sequential operation. Writing the segment files writes
multiple files - essentially causing random access writes.
For large segments, multiple sequentially-written large files
should perform similarly to one large sequentially-written file.
It is only close to random access on the smallest segments (which
a sufficiently-large flush-by-ram shouldn't produce).
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]