Hi All,

Thanks for the suggestions. But there is a slight difference in the 
requirements.
1. We don't  index/ search 10 million documents for a keyword; instead we do it 
on only 500 documents because we are supposed to get the final result only from 
the 500 set of documents.
2.We have already filtered 500 documents from the 10M+ documents based on a DB 
Stored Procedure which has nothing to do with any kind of search keywords .
3.Our search algorithm plays a vital role on this new set of 500 documents.
4.We can't avoid on the fly indexing because the  document set to be indexed is 
random and is ever changing .
        Although we can index the existing 10M+ docs before hand and keep ready 
the indexes..We don’t want to search on the complete document store. Instead we 
only want to search on the 500 documents got above.

Is there any best alternative to this requirement?

Thanks,

Shruthi Sethi
SR. SOFTWARE ENGINEER
iMedX
OFFICE:  
033-4001-5789 ext. N/A
MOBILE:  
91-9903957546
EMAIL:  
sse...@imedx.com
WEB:  
www.imedx.com



-----Original Message-----
From: shashi....@gmail.com [mailto:shashi....@gmail.com] On Behalf Of Shashi 
Kant
Sent: Saturday, May 24, 2014 5:55 AM
To: java-user@lucene.apache.org
Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server

To 2nd  Vitaly's suggestion. You should consider using Apache Solr
instead - it handles such issues OOTB .


On Fri, May 23, 2014 at 7:52 PM, Vitaly Funstein <vfunst...@gmail.com> wrote:
> At the risk of sounding overly critical here, I would say you need to scrap
> your entire approach of building one small index per request, and just
> build your entire searchable data store in Lucene/Solr. This is the
> simplest and probably most maintainable and scalable solution. Even if your
> index contains 10M+ documents, returning at most 500 search results should
> be lightning fast compared to the latencies you're seeing right now. To
> facilitate data export from the DB, take a look at this:
> http://wiki.apache.org/solr/DataImportHandler
>
>
> On Tue, May 20, 2014 at 7:36 AM, Shruthi <sse...@imedx.com> wrote:
>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
>> Sent: Tuesday, May 20, 2014 3:48 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server
>>
>> On Tue, 2014-05-20 at 11:56 +0200, Shruthi wrote:
>>
>> Toke:
>> > Is 20 second an acceptable response time for your users?
>> >
>> > Shruthi: Its definitely not acceptable. PFA the piece of code that we
>> > are using..Its taking 20seconds. That’s why I drafted this ticket to
>> > see where I was going wrong.
>>
>> Indexing 1000 documents/sec in Lucene is quite common, so even taking
>> into account large documents, 20 seconds sounds like quite a bit.
>> Shruthi: I had attached the code snippet in previous mail. Do you suspect
>> a foul play there?
>>
>> > Shruthi: Well,  its two stage process: Client is looking at
>> > historical data based on a parameters like names, dates,MRN, fields
>> > etc.. SO the query actually gets the data set fulfilling the
>> > requirements
>> >
>> > If client is interested in doing a text search then he would pass the
>> > search phrase on the result set.
>>
>> So it is not possible for a client to perform a broad phrase search to
>> start with. And it sounds like your DB-queries are all simple matching?
>> No complex joins and such? If so, this calls even more for a full
>> Lucene-index solution, which handles all aspect of the search process.
>> Shruthi: We call a DB stored procedure to get us the result set for
>> working with..
>> We will be using highlighter API and  I don’t think Memory  index can be
>> used with highlighter.
>>
>> >
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>



-- 
sk...@alum.mit.edu
(617) 595-5946

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to