> From: Dave Kor [mailto:[EMAIL PROTECTED]] > > This leads me to yet another of my buring questions.. > has anyone pushed Lucene to its limits yet? If so, > what are they? What happens when Lucene hit its limit? > Does it throw an exception? coredump?
There are many limits that could be hit. Lucene's design is that hard limits should be hard to hit. Lucene only caches a few critical data structures in memory, in order to keep from hitting the JVM's heap size limit, relying instead on the file system's caches for performance. Lucene uses 63-bit file pointers, so it will be a long time before raw index size is a limit, however filesystems that do not support files larger than, e.g., 2GB will limit things. Document and term numbers are 31-bit, so two billion documents or terms is another limit that will will probably not be hit too soon. Performance for large indices is frequently governed by i/o performance. If an index is larger than RAM then searches will need to read data from disk. This can quickly become a bottleneck. A search for a term that occurs in a million documents can require over 1MB of data, which can take some time to read. With multiple searching threads, the disk can easily become a bottleneck. Disk arrays can alleviate this, more RAM helps even more! For some folks, queries that take over a second are unacceptable, for others, ten seconds is okay. Performance should be more-or-less linear: a two-million document index will be almost twice as slow to search as a one-million document index. There are lots of factors, including document size, CPU-speed, RAM-size, i/o subsystem, but a rough rule-of-thumb for Lucene performance might be that, in a "typical" configuration, it can search a million documents per second. So if you need to search 20 million 100kB documents on a 100Mhz 386 with 8MB of RAM with sub-second response time, Lucene will probably fail. But if you need to search two million 2kB documents on a 500Mhz Pentium with 128MB of RAM in a couple of seconds per query, you're probably okay. Doug _______________________________________________ Lucene-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lucene-dev
