Hi Ori,

Before taking drastic rehosting measures, and introducing the associated software complexity off splitting your application into pieces running on separate machines, I'd recommend looking at the way your document data is distributed and the way you're searching them. Here are some questions that may help you find a less-complex solution:

- Is your high ratio of unique terms to documents due to a unique identifier in the documents? If so, are you performing wildcard or range searches on that field?

- Are your queries "canned", i.e. hard-coded in form, or are they "ad hoc", coming from users?

- Do your queries refer to every field you've indexed? On a similar note, does your application use every field you've indexed or stored in Lucene?

- How many documents do your queries hit typically? How many of those hits do you typically use?

- How important is it that queries are run on up-to-the-second data? In other words, would the hits be pretty much as useful if the updates were batched up for a few runs per day, instead of continuous?


One of the things I really like about Lucene is that one can quickly whip up an application and it basically works. But, like most databases, small differences in organization can produce disproportionately large differences in performance when there are millions of rows/records/entries. A little time spent examining data distribution and access patterns can go a long way.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to