Hi David,

Thanks for your reply.

After posting the question, I have done this in a more optimum way.

- I used only a single NutchBean and modified it so that the search
method takes the indices being searched as an argument. This single
NutchBean creates separate IndexReaders on the merged indices in the
directories and keeps them in a map.

- Based on the indexes that are searched, NutchBean creates an
IndexSearcher using the appropriate IndexReaders. I have added a
constructor to IndexSearcher that takes an array of IndexReaders and
uses a MultiReader to initialize itself.

- The NutchBean creates a single FetchedSegments with the combination
of the segments directories in all the directories.

The advantages with this are:

- A single IndexReader for an index - so no additional filehandles are created.
- No opening / closing of readers or segments - this improves performance.


- Ravi Chintakunta


> This is almost exactly what I've done.  I create a new NutchBean for
> each search, and point it at whichever of 9 subdirectories the user has
> selected; because I really don't want 511 (2^9-1) beans hanging around.
>
> The reason for the "too many open files" is that the NutchBean doesn't
> clean up after itself - I guess because for most people, the NutchBean
> is going to be reused.
>
> I added a close() method to FetchSegments.Segment in my installation,
> to close all the readers.  I added a closeSegments() method to
> NutchBean, to call close() on each segment that's been opened.  Then I
> call closeSegments() after each search.
>
> I realise that NutchBean really wasn't designed to support being
> instantiated once per search, but I don't care.  It works well, and
> performance is not an issue.
>
> Regards,
> David.
>
>
> Date: Mon, 6 Feb 2006 20:59:34 -0500
> From: Ravi Chintakunta <[EMAIL PROTECTED]>
> To: nutch-user@lucene.apache.org
> Subject: [Nutch-general] Dynamic merging of indices
> Reply-To: [EMAIL PROTECTED]
>
> I have multiple indices for the crawls across various intranet sites
> stored in separate folders. My search application should support
> searching across one or more of these indices dynamically - by way of
> checkboxes on the web page.  For this, I have modified NutchBean to
> create the IndexSearcher and FetchedSegments from the segments
> directory (not the merged index directory) in these folders.  Based on
> the selected intranet sites, a NutchBean is instantiated for the
> indices  of the selected sites and the results are displayed.
>
> With this I had the "Too many open files error" and have increased the
> number of files limit.
>
> This seems to work well now. But if I have 5 such sites, then I am
> opening 2^5 =3D 32 times more files than I would have opened.
>
> My question is: Is there a better way of doing this? Like:
>
> - Can I open an IndexReader on each of the merged index directory and
> dynamically create an IndexSearcher by merging these readers using
> MultiReader?
>
> - Is an IndexReader thread safe and can it be used simultaneously in
> different IndexSearchers?
>
> - Can I create the IndexReader on the merged index directory and
> create the corresponding FetchedSegments on the corresponding
> non-merged segments directory?
>
> Thanks
> Ravi Chintakunta
>
>
>
>
> ********************************************************************************
> This email may contain legally privileged information and is intended only 
> for the addressee. It is not necessarily the official view or
> communication of the New Zealand Qualifications Authority. If you are not the 
> intended recipient you must not use, disclose, copy or distribute this email 
> or
> information in it. If you have received this email in error, please contact 
> the sender immediately. NZQA does not accept any liability for changes made 
> to this email or attachments after sending by NZQA.
>
> All emails have been scanned for viruses and content by MailMarshal.
> NZQA reserves the right to monitor all email communications through its 
> network.
>
> ********************************************************************************
>
>

Reply via email to