RE: lucene norms cached twice

Cabansag, Ronald-Alvin R Fri, 29 Oct 2010 12:32:32 -0700

We create a single instance of IndexReader using IndexReader.open(new 
MMapDirectory(file))and a single instance of IndexSearcher using this index 
reader.


Searches in our application are done with two operations. The first operation 
gets the hit count. We use a QueryWrapperFilter.getDocIdSet(indexReader) to get 
the DocIdSet and compute the hit count using its iterator. This is where we see 
the ReadOnlyDirectoryReader caching its own copy of norms.

The second operation is where we actually do the search using 
IndexSearcher.search(new TermQuery(...), filter, collector) method. This is 
where the sub-reader caches its own copy of norms.

Regards,
Al

-----Original Message-----
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Friday, October 29, 2010 1:27 PM
To: java-user@lucene.apache.org
Subject: Re: lucene norms cached twice

Norms should not normally be loaded twice.

Since 2.9, searching is done at the sub-reader level, and so norms
should never be loaded for the main reader.

But can you describe how you're using Lucene?

Mike

On Fri, Oct 29, 2010 at 9:27 AM, Cabansag, Ronald-Alvin R
<ronald-alvin.caban...@cengage.com> wrote:
>
> We are working with a large readonly lucene index(single segment) with large 
> number of fields and documents and are running into memory usage problems.
>
> We found that when using a ReadOnlyDirectoryReader and IndexSearcher created 
> using the same reader, the norms are cached twice - first by the reader 
> itself and second by the reader's subreaders. Is there an easy way to avoid 
> having the norms cached twice when we only have a single subreader?
>
> We thought of the following options:
> 1.) pass in the main reader as a subreader when creating the  IndexSearcher?  
> ( e.g. new IndexSearcher(mainReader, IndexReader[] {mainReader}, int[] {0} )
> 2.) override ReadOnlyDirectoryReader.getSequentialSubReaders() method and 
> return null. This tells the IndexSearcher to use the main reader- 
> ReadOnlyDirectoryReader.
> 3.) use SegmentReader.get(boolean, SegmentInfo, int) to create a 
> ReadOnlySegmentReader that we use as our main reader instead.
>
> Are there any negative implications to the above approaches? Or are there 
> better approaches to the problem?
>
> Thanks in advance for any help.
>
> Alvin Cab
> Cengage Learning
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: lucene norms cached twice

Reply via email to