Re: Faceted Search using Lucene

Amin Mohammed-Coleman Sun, 01 Mar 2009 06:17:46 -0800

Hi
I've now done the following:

public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {


final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List<Summary> summaryList = new ArrayList<Summary>();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

List<IndexSearcher> indexSearchers = new ArrayList<IndexSearcher>();

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
indexSearchers.size() +"'");

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' ----> Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

TopDocs topDocs = get().search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = multiSearcher.doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

multiSearcher.close();

} catch (Exception e) {

throw new IllegalStateException(e);

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}


And have the following methods:

@PostConstruct

public void initialiseQueryParser() {

PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
analyzer);

analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
newKeywordAnalyzer());

queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzerWrapper);

 try {

LOGGER.debug("Initialising multi searcher ....");

this.multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[]
{}));

LOGGER.debug("multi searcher initialised");

} catch (IOException e) {

throw new IllegalStateException(e);

}

 }


Initialises mutltisearcher when this class is creared by spring.


private synchronized void swapMultiSearcher(MultiSearcher newMultiSearcher)
{

try {

release(multiSearcher);

} catch (IOException e) {

throw new IllegalStateException(e);

}

multiSearcher = newMultiSearcher;

}

  public void maybeReopen() throws IOException {

 MultiSearcher newMultiSeacher = null;

 boolean refreshMultiSeacher = false;

 List<IndexSearcher> indexSearchers = new ArrayList<IndexSearcher>();

 synchronized (searchers) {

 for (IndexSearcher indexSearcher: searchers) {

 IndexReader reader = indexSearcher.getIndexReader();

 reader.incRef();

 Directory directory = reader.directory();

 long currentVersion = reader.getVersion();

 if (IndexReader.getCurrentVersion(directory) != currentVersion) {

 IndexReader newReader = indexSearcher.getIndexReader().reopen();

 if (newReader != reader) {

 reader.decRef();

 refreshMultiSeacher = true;

 }

 reader = newReader;

 IndexSearcher newSearcher = new IndexSearcher(newReader);

 indexSearchers.add(newSearcher);

 }

 }

 }



 if (refreshMultiSeacher) {

newMultiSeacher = new
MultiSearcher(indexSearchers.toArray(newIndexSearcher[] {}));

warm(newMultiSeacher);

swapMultiSearcher(newMultiSeacher);

 }



 }


  private void warm(MultiSearcher newMultiSeacher) {

 }



 private synchronized MultiSearcher get() {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().incRef();

}

return multiSearcher;

}

 private synchronized void release(MultiSearcher multiSearcher)
throwsIOException {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().decRef();

}

}


However I am now getting


java.lang.IllegalStateException:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed


on the call:


private synchronized MultiSearcher get() {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().incRef();

}

return multiSearcher;

}


I'm doing something wrong ..obviously..not sure where though..


Cheers


On Sun, Mar 1, 2009 at 1:36 PM, Michael McCandless <
[email protected]> wrote:

>
> I was wondering the same thing ;)
>
> It's best to call this method from a single BG "warming" thread, in which
> case it would not need its own synchronization.
>
> But, to be safe, I'll add internal synchronization to it.  You can't simply
> put synchronized in front of the method, since you don't want this to block
> searching.
>
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  just a quick point:
>> public void maybeReopen() throws IOException {                 //D
>>  long currentVersion = currentSearcher.getIndexReader().getVersion();
>>  if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>>    IndexReader newReader = currentSearcher.getIndexReader().reopen();
>>    assert newReader != currentSearcher.getIndexReader();
>>    IndexSearcher newSearcher = new IndexSearcher(newReader);
>>    warm(newSearcher);
>>    swapSearcher(newSearcher);
>>  }
>> }
>>
>> should the above be synchronised?
>>
>> On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <[email protected]
>> >wrote:
>>
>>  thanks.  i will rewrite..in between giving my baby her feed and playing
>>> with the other child and my wife who wants me to do several other things!
>>>
>>>
>>>
>>> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
>>> [email protected]> wrote:
>>>
>>>
>>>> Amin Mohammed-Coleman wrote:
>>>>
>>>> Hi
>>>>
>>>>> Thanks for your input.  I would like to have a go at doing this myself
>>>>> first, Solr may be an option.
>>>>>
>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>> creating unnecessary garbage; instead, they should be created once
>>>>> & reused.
>>>>>
>>>>> -- I can moved the code out so that it is only created once and reused.
>>>>>
>>>>>
>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>> which GC then must sweep up.
>>>>>
>>>>> -- This was something I thought about.  I could move it out so that
>>>>> it's
>>>>> created once.  However I presume inside my code i need to check whether
>>>>> the
>>>>> indexreaders are update to date.  This needs to be synchronized as well
>>>>> I
>>>>> guess(?)
>>>>>
>>>>>
>>>> Yes you should synchronize the check for whether the IndexReader is
>>>> current.
>>>>
>>>> * I don't see any synchronization -- it looks like two search
>>>>
>>>>> requests are allowed into this method at the same time?  Which is
>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>> readers.
>>>>> --  So i need to extract the logic for reopening and provide a
>>>>> synchronisation mechanism.
>>>>>
>>>>>
>>>> Yes.
>>>>
>>>>
>>>> Ok.  So I have some work to do.  I'll refactor the code and see if I can
>>>>
>>>>> get
>>>>> inline to your recommendations.
>>>>>
>>>>>
>>>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>>>> [email protected]> wrote:
>>>>>
>>>>>
>>>>>  On a quick look, I think there are a few problems with the code:
>>>>>>
>>>>>> * I don't see any synchronization -- it looks like two search
>>>>>> requests are allowed into this method at the same time?  Which is
>>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>>> readers.
>>>>>>
>>>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>>> don't see a corresponding decRef.
>>>>>>
>>>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>>> meaning the unlucky search request that hits a reopen pays the
>>>>>> cost.  This might be OK if the index is small enough that
>>>>>> reopening & warming takes very little time.  But if index gets
>>>>>> large, making a random search pay that warming cost is not nice to
>>>>>> the end user.  It erodes their trust in you.
>>>>>>
>>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>>> which GC then must sweep up.
>>>>>>
>>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>>> creating unnecessary garbage; instead, they should be created once
>>>>>> & reused.
>>>>>>
>>>>>> You should consider simply using Solr -- it handles all this logic for
>>>>>> you and has been well debugged with time...
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Amin Mohammed-Coleman wrote:
>>>>>>
>>>>>> The reason for the indexreader.reopen is because I have a webapp which
>>>>>>
>>>>>>  enables users to upload files and then search for the documents.  If
>>>>>>> I
>>>>>>> don't
>>>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>>>
>>>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>>>> [email protected]
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>>  I have been able to get the code working for my scenario, however I
>>>>>>>> have
>>>>>>>> a
>>>>>>>> question and I was wondering if I could get some help.  I have a
>>>>>>>> list
>>>>>>>> of
>>>>>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>>>>>> indexsearchers to get each indexreader and put them into a
>>>>>>>> MultiIndexReader.
>>>>>>>>
>>>>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>>>>
>>>>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>>>>
>>>>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>>>>
>>>>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>>>>
>>>>>>>> IndexReader newReader = readers[i].reopen();
>>>>>>>>
>>>>>>>> if (newReader != readers[i]) {
>>>>>>>>
>>>>>>>> readers[i].close();
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> readers[i] = newReader;
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> multiReader = new MultiReader(readers);
>>>>>>>>
>>>>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>>>>> newOpenBitSetFacetHitCounter();
>>>>>>>>
>>>>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>>>>
>>>>>>>>
>>>>>>>> I then use the indexseacher to do the facet stuff.  I end the code
>>>>>>>> with
>>>>>>>> closing the multireader.  This is causing problems in another method
>>>>>>>> where I
>>>>>>>> do some other search as the indexreaders are closed.  Is it ok to
>>>>>>>> not
>>>>>>>> close
>>>>>>>> the multiindexreader or should I do some additional checks in the
>>>>>>>> other
>>>>>>>> method to see if the indexreader is closed?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>> P.S. Hope that made sense...!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks just what I needed!
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> Amin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Amin:
>>>>>>>>>
>>>>>>>>> Please take a look a this blog post:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>>>>> Best regards, Marcelo.
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Sorry to re send this email but I was wondering if I could get
>>>>>>>>>>> some
>>>>>>>>>>> advice
>>>>>>>>>>> on this.
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>>
>>>>>>>>>>> Amin
>>>>>>>>>>>
>>>>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  I am looking at building a faceted search using Lucene.  I know
>>>>>>>>>>>> that
>>>>>>>>>>>> Solr
>>>>>>>>>>>> comes with this built in, however I would like to try this by
>>>>>>>>>>>> myself
>>>>>>>>>>>> (something to add to my CV!).  I have been looking around and I
>>>>>>>>>>>> found
>>>>>>>>>>>> that
>>>>>>>>>>>> you can use the IndexReader and use TermVectors.  This looks ok
>>>>>>>>>>>> but
>>>>>>>>>>>> I'm
>>>>>>>>>>>> not
>>>>>>>>>>>> sure how to filter the results so that a particular user can
>>>>>>>>>>>> only
>>>>>>>>>>>> see
>>>>>>>>>>>> a
>>>>>>>>>>>> subset of results.  The next option I was looking at was
>>>>>>>>>>>> something
>>>>>>>>>>>> like
>>>>>>>>>>>>
>>>>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>>>>
>>>>>>>>>>>> The only problem here is that I have to provide the brand type
>>>>>>>>>>>> each
>>>>>>>>>>>> time a
>>>>>>>>>>>> new brand is created.  Again I'm not sure how I can filter the
>>>>>>>>>>>> results
>>>>>>>>>>>> here.
>>>>>>>>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>>>>>>>>
>>>>>>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> Amin
>>>>>>>>>>>>
>>>>>>>>>>>> P.S.  I am basically trying to do something that displays the
>>>>>>>>>>>> following
>>>>>>>>>>>>
>>>>>>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>> Marcelo F. Ochoa
>>>>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>>>>> ______________
>>>>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>>>>> Is Oracle 11g REST ready?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Faceted Search using Lucene

Reply via email to