RE: NewBie To Lucene || Perfect configuration on a 64 bit server

Shruthi Tue, 20 May 2014 02:57:20 -0700

-----Original Message-----
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Tuesday, May 20, 2014 3:01 PM
To: java-user@lucene.apache.org
Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server

On Tue, 2014-05-20 at 10:40 +0200, Shruthi wrote:

> Just the indexing took 20 seconds L

That's more than I expected, but it leaves the same question:

Is 20 second an acceptable response time for your users?

Shruthi: Its definitely not acceptable. PFA the piece of code that we are 
using..Its taking 20seconds. That’s why I drafted this ticket to see where I 
was going wrong.

I don't know your document size, but unless they are very large, the

response times from a full 10M document index will be way better than 20

seconds. Even on a low-RAM machine with spinning drives.

> We are yet to try on 64 bit server to check if that would change

> drastically.

I doubt it will.

Toke:

> RAMDirectory seems a better choice.

>

> Shruthi : But RAM DIrectory  has bad concurrency on multithreaded

> environments.

I assumed you would be creating a dedicated index for each request,

thereby effectively having single threaded usage for each separate

index.

Shruthi: Yes we are creating a dedicated index for each request. Ok so RAM 
Directory holds good for our use case then. By the way we would be using the

Highlighter APi also..we just found out that using that API increased the index 
size by 4 times.

I just remembered that Lucene has an implementation dedicated to fast

indexing. Take a look at

http://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html

It seems like just the thing for your use case.

Shruthi: Thank you will definetly try this..

> Shruthi : The same user from the same client will not be searching for

> same phrase again unless he has amnesia. This was already discussed

> with our architects.

If your architects base their decisions on observed user behaviour, then

fine. At our library, many users refines their queries, meaning that a

common pattern is 2-4 queries that are very much alike.

Shruthi : I will put forward this approach. We search medical transcripts and 
most of the time users search for drug names. I’m not sure if we can generalize 
this query.

> Shruthi:  Actually we have a DB query that runs prior to indexing

> which fetches max. 500 docs from 10million+ docs in NASSHARE. We then

> have to apply search phrase only on the resultant set..So this way

>

> The set is just limited to 500 -1000.

Frankly, the combination of a pre-selection with a DB query and the

addon of heavy index + search with Lucene seems like the absolute worst

of both worlds.

Does the DB-selector do anything that cannot easily be replicated in

Lucene?

Shruthi: Well,  its two stage process: Client is looking at  historical data 
based on a parameters like names, dates,MRN, fields etc.. SO the query actually 
gets the data set fulfilling the requirements

If client is interested in doing a text search then he would pass the search 
phrase on the result set.

- Toke Eskildsen, State and University Library, Denmark

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: NewBie To Lucene || Perfect configuration on a 64 bit server

Reply via email to