Generally you shouldn't hit OOM. But it may change depending on how you use the index. For example, if you have millions of documents spread across the 100 GB, and you use sorting for various fields, then it will consume lots of RAM. Also, if you run hundreds of queries in parallel, each with a dozen terms, it will also consume some considerable amount of RAM.
But if you don't do anything extreme w/ it, and you can allocate enough heap size, then you should be ok. The way I make such decisions is I design a test which mimics the typical/common scenario I expect to face, and then I run it on a machine I believe will be used in production (or as close as I can get), and analyze the results. If you choose to do that, and you're not satisfied w/ the results, you're welcome to post back w/ the machine statistics and exact use case, and I believe there are plenty of folks here who'd be willing to help you optimize the usage of Lucene by your app. Or at least then we'll be able to tell you: "for this index and this machine, you cannot run a 100GB index". Shai On Thu, Jul 23, 2009 at 10:42 AM, m.harig <m.ha...@gmail.com> wrote: > > Thanks all , > > Very thankful to all , am tired of hadoop settings , is it > good to use read such type large index with lucene alone? will it go for > OOM > ? anyone pl suggest me. > -- > View this message in context: > http://www.nabble.com/indexing-100GB-of-data-tp24600563p24620846.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >