Re: [lucy-user] ClusterSearcher

goran kent Sun, 06 Nov 2011 01:16:38 -0800

On 11/5/11, Marvin Humphrey <[email protected]> wrote:
>         // This line triggers a call to the top_docs() subroutine within
>         // SearchClient.pm.  It blocks until top_docs() returns, and thus
> the
>         // total time to process all remote requests in this loop is the sum
>         // of all child node response times.


/releases held breath

My faith in Lucy is restored :), I was dreading a response to the
effect that the remote search stuff was immutable and couldn't be
significantly improved.

> To process the searches in parallel, we need a select loop[1].  However,
> PolySearcher can only access SearchClient via the abstract
> Lucy::Search::Searcher interface -- it knows nothing about the socket calls
> that are being made by SearchClient.pm.  PolySearcher would have to pierce
> encapsulation in order to get at those sockets and multiplex the requests.

Yup, sounds like that approach is buggered.

> The most straightforward solution is to eliminate PolySearcher from the
> equation and to create a class that combines the functionality of
> PolySearcher
> and SearchClient.  Fortunately, neither of them is particularly large or
> complex, so the task is very doable.
>
> I propose that we name this new class LucyX::Remote::ClusterSearcher.
>
>   * Fork SearchClient.pm to ClusterSearcher.pm and t/510-remote.t to
>     t/550-cluster_searcher.t.
>   * Give ClusterSearcher the ability to talk to multiple SearchServers.
>   * Change to a two-stage RPC mechanism:
>     1. Fire off the requests to the individual SearchServers in a "for"
> loop.
>     2. Gather the responses into an array using a select() loop (powered by
> an
>        IO::Select object).
>   * Adapt each of the Searcher methods that ClusterSearcher implements to
>     assemble a sensible return value from the array of responses using
>     PolySearcher's techniques.
>
> This won't be the end of our iterating if we want to build a robust
> clustering
> system, because it doesn't yet address either node availability issues or
> near-real-time updates.  However, it provides the functionality that we
> meant
> to make available via PolySearcher/SearchServer/SearchClient, allowing Goran
> to evaluate whether the system meets his basic requirements, and moves us
> incrementally towards a highly desirable goal: a ClusterSearcher object
> backed
> by multiple search nodes that is just as easy to use as an IndexSearcher
> backed by one index on one machine.
>
> PS: Goran...I'm under the weather right now, so if you're counting on me to
> code this up, I'm not sure how quickly I'll get to it.

No worries, you take care of yourself and get better.

I would like to thank you for your responsiveness on this list, you
and others have been top notch.  Even though it's a bit disappointing
that the cluster search functionality doesn't work 100% right now, the
level of commitment is gratifying to see, and it's reassuring to know
that this shortcoming will be addressed soon (hopefully before we
launch our service... ;)

I feel vindicated in my decision to move from another library to Lucy.  w00t!

Re: [lucy-user] ClusterSearcher

Reply via email to