-----Original Message----- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Tuesday, May 20, 2014 3:01 PM To: java-user@lucene.apache.org Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server
On Tue, 2014-05-20 at 10:40 +0200, Shruthi wrote: > Just the indexing took 20 seconds L That's more than I expected, but it leaves the same question: Is 20 second an acceptable response time for your users? Shruthi: Its definitely not acceptable. PFA the piece of code that we are using..Its taking 20seconds. That’s why I drafted this ticket to see where I was going wrong. I don't know your document size, but unless they are very large, the response times from a full 10M document index will be way better than 20 seconds. Even on a low-RAM machine with spinning drives. > We are yet to try on 64 bit server to check if that would change > drastically. I doubt it will. Toke: > RAMDirectory seems a better choice. > > Shruthi : But RAM DIrectory has bad concurrency on multithreaded > environments. I assumed you would be creating a dedicated index for each request, thereby effectively having single threaded usage for each separate index. Shruthi: Yes we are creating a dedicated index for each request. Ok so RAM Directory holds good for our use case then. By the way we would be using the Highlighter APi also..we just found out that using that API increased the index size by 4 times. I just remembered that Lucene has an implementation dedicated to fast indexing. Take a look at http://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html It seems like just the thing for your use case. Shruthi: Thank you will definetly try this.. > Shruthi : The same user from the same client will not be searching for > same phrase again unless he has amnesia. This was already discussed > with our architects. If your architects base their decisions on observed user behaviour, then fine. At our library, many users refines their queries, meaning that a common pattern is 2-4 queries that are very much alike. Shruthi : I will put forward this approach. We search medical transcripts and most of the time users search for drug names. I’m not sure if we can generalize this query. > Shruthi: Actually we have a DB query that runs prior to indexing > which fetches max. 500 docs from 10million+ docs in NASSHARE. We then > have to apply search phrase only on the resultant set..So this way > > The set is just limited to 500 -1000. Frankly, the combination of a pre-selection with a DB query and the addon of heavy index + search with Lucene seems like the absolute worst of both worlds. Does the DB-selector do anything that cannot easily be replicated in Lucene? Shruthi: Well, its two stage process: Client is looking at historical data based on a parameters like names, dates,MRN, fields etc.. SO the query actually gets the data set fulfilling the requirements If client is interested in doing a text search then he would pass the search phrase on the result set. - Toke Eskildsen, State and University Library, Denmark --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org