Things to consider:
- disk speed and whether it is busy satisfying other processes'
requests
- CPU speed
- amount or free RAM in the machine and amount of RAM given to your JVM
- the bottleneck - could be a slow XML parser, for instance, profile it

I'm about to submit another Lucene article to Onjava.com.  It talks
about the performance of indexing.  I don't know when exactly it will
be published, but when it does I'll send the URL to the list.

Otis



--- Michael Barry <[EMAIL PROTECTED]> wrote:
> All,
>    I'm in need of some pointers, hints or tips on indexing large
> collections
> of data. I know I saw some tips on this list before but when I tried 
> searching
> the list, I came up blank.
>    I have a large collection of XML files (336000 files around 5K 
> apiece) that I'm
> indexing and its taking quite a bit of time (27 hours). I've played 
> around with the
> mergeFactor, RAMDirectories and multiple threads (X number of threads
> 
> indexing
> a subset of the data and then merging the indexes at the end) but I 
> cannot seem
> to bring the time down. I'm probably not doing these things properly
> but 
> from
> what I read I believe I am.  Maybe this is the best I can do with
> this 
> data but I
> would be really grateful to hear how others have tackled this same
> issue.
>    As always pointers to places in the mailing list archive or other 
> places would be
> appreciated.
> 
> Thanks, Mike.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to