On 11/5/11, Marvin Humphrey <[email protected]> wrote: > // This line triggers a call to the top_docs() subroutine within > // SearchClient.pm. It blocks until top_docs() returns, and thus > the > // total time to process all remote requests in this loop is the sum > // of all child node response times.
/releases held breath My faith in Lucy is restored :), I was dreading a response to the effect that the remote search stuff was immutable and couldn't be significantly improved. > To process the searches in parallel, we need a select loop[1]. However, > PolySearcher can only access SearchClient via the abstract > Lucy::Search::Searcher interface -- it knows nothing about the socket calls > that are being made by SearchClient.pm. PolySearcher would have to pierce > encapsulation in order to get at those sockets and multiplex the requests. Yup, sounds like that approach is buggered. > The most straightforward solution is to eliminate PolySearcher from the > equation and to create a class that combines the functionality of > PolySearcher > and SearchClient. Fortunately, neither of them is particularly large or > complex, so the task is very doable. > > I propose that we name this new class LucyX::Remote::ClusterSearcher. > > * Fork SearchClient.pm to ClusterSearcher.pm and t/510-remote.t to > t/550-cluster_searcher.t. > * Give ClusterSearcher the ability to talk to multiple SearchServers. > * Change to a two-stage RPC mechanism: > 1. Fire off the requests to the individual SearchServers in a "for" > loop. > 2. Gather the responses into an array using a select() loop (powered by > an > IO::Select object). > * Adapt each of the Searcher methods that ClusterSearcher implements to > assemble a sensible return value from the array of responses using > PolySearcher's techniques. > > This won't be the end of our iterating if we want to build a robust > clustering > system, because it doesn't yet address either node availability issues or > near-real-time updates. However, it provides the functionality that we > meant > to make available via PolySearcher/SearchServer/SearchClient, allowing Goran > to evaluate whether the system meets his basic requirements, and moves us > incrementally towards a highly desirable goal: a ClusterSearcher object > backed > by multiple search nodes that is just as easy to use as an IndexSearcher > backed by one index on one machine. > > PS: Goran...I'm under the weather right now, so if you're counting on me to > code this up, I'm not sure how quickly I'll get to it. No worries, you take care of yourself and get better. I would like to thank you for your responsiveness on this list, you and others have been top notch. Even though it's a bit disappointing that the cluster search functionality doesn't work 100% right now, the level of commitment is gratifying to see, and it's reassuring to know that this shortcoming will be addressed soon (hopefully before we launch our service... ;) I feel vindicated in my decision to move from another library to Lucy. w00t!
