On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe < [email protected]> wrote:
> Can I create an index file with very large size, like 1 TB or so? Is there > any limit on how large index file one can create? Also, will I be able to > search on this 1 TB index file at all? > Leaving aside the question of hardware or JVM limits on monstrous files, this question (can you search this file) is easier: if you've got say, a ten billion documents in one index, and you have a query which is going to hit maybe even just 0.1% of the documents, you'll need to do scoring of 10 million hits in the course of that query. To do this in under a second means you only have 100 nanoseconds to look at each document. If your query hits 1% of your documents, you're down to 10 ns per document. I've never tried searching a 1TB index, but I'd say that's pushing it. Is there a reason you can't shard your index, and instead put maybe 20 shards of 50GB (or better - 100 shards of 10GB) each on a variety of machines, and just merge results? -jake
