I doubt this will make Lucene much faster, since Lucene already implements buffering in its InputStream and OutputStream classes. So Lucene already has this optimization built-in.

Doug

Andrzej Bialecki wrote:
Hello,

Since you are trying this anyway, and looking for ways to improve indexing times... Could you perhaps try to replace use of java.io.RandomAccessFile in FSDirectory implementation, with the attached implementation? It supposedly increases I/O throughput by orders of magnitude, by using partial buffering.

Terry Steichen wrote:

Mike,

By way of comparison, I've got a collection of about 50,000 XML files, each
of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz
machine). I use basically the standard configuration (mergeFactor, etc.)
and I've got about 30 fields per document. I add about 200 new ones per
day. I don't recall how long that it takes to index the 200 (I do it
through a background task), but it takes a couple of minutes to merge the
new 200 document index with the master index.


HTH,

Terry

----- Original Message -----
From: "Michael Barry" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, February 24, 2003 2:00 PM
Subject: Indexing Tips and Hints



All,
  I'm in need of some pointers, hints or tips on indexing large


collections

of data. I know I saw some tips on this list before but when I tried
searching
the list, I came up blank.
I have a large collection of XML files (336000 files around 5K
apiece) that I'm
indexing and its taking quite a bit of time (27 hours). I've played
around with the
mergeFactor, RAMDirectories and multiple threads (X number of threads
indexing
a subset of the data and then merging the indexes at the end) but I
cannot seem
to bring the time down. I'm probably not doing these things properly but
from
what I read I believe I am. Maybe this is the best I can do with this
data but I
would be really grateful to hear how others have tackled this same issue.
As always pointers to places in the mailing list archive or other
places would be
appreciated.


Thanks, Mike.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]






------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to