Mike,

By way of comparison, I've got a collection of about 50,000 XML files, each
of which averages about 8K.  It takes about 1.25 hours to index (on a 1.8Ghz
machine).  I use basically the standard configuration (mergeFactor, etc.)
and I've got about 30 fields per document.  I add about 200 new ones per
day.  I don't recall how long that it takes to index the 200 (I do it
through a background task), but it takes a couple of minutes to merge the
new 200 document index with the master index.

HTH,

Terry

----- Original Message -----
From: "Michael Barry" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, February 24, 2003 2:00 PM
Subject: Indexing Tips and Hints


> All,
>    I'm in need of some pointers, hints or tips on indexing large
collections
> of data. I know I saw some tips on this list before but when I tried
> searching
> the list, I came up blank.
>    I have a large collection of XML files (336000 files around 5K
> apiece) that I'm
> indexing and its taking quite a bit of time (27 hours). I've played
> around with the
> mergeFactor, RAMDirectories and multiple threads (X number of threads
> indexing
> a subset of the data and then merging the indexes at the end) but I
> cannot seem
> to bring the time down. I'm probably not doing these things properly but
> from
> what I read I believe I am.  Maybe this is the best I can do with this
> data but I
> would be really grateful to hear how others have tackled this same issue.
>    As always pointers to places in the mailing list archive or other
> places would be
> appreciated.
>
> Thanks, Mike.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to