Hi,

I'm considering changing our established caching mechanism to allow
for more nimble cache refreshing (ie, when the backend indexes change
beyond threshold X).  Instead of caching using our reverse-proxy
cluster, I'd like to cache the $response on each remote searcher node.

My idea is to splice into LucyX/Remote/SearchServer.pm's sub serve():

# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
my $response   = $dispatch{$method}->( $self, thaw($buf) );
my $frozen     = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;


becomes,

# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
#---------incision start----------
my $response;
my $cached_object_id = md5sum($buf); # TODO: check if $buf is the search string

if (is_cached($cached_object_id)) {
    $response = read_cached_object($cached_object_id);
}
else {
    $response   = $dispatch{$method}->( $self, thaw($buf) );
}
#---------incision end----------
my $frozen     = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;

....

I seem to recall though that the typical search is not an atomic
transaction:  ie, the remote search protocol is broken up into
discrete request/response chunks:


my $hits = $poly_searcher->hits(
    query      => $parsed_query,
    sort_spec  => $sort_spec,
    offset     => 0,  # or 10, 20, etc
    num_wanted => 10,
);


is processed roughly as:

doc_max/response
doc_freq/response x 31
...
top_docs/response
fetch_doc/response x 10
...
done

So, my question is basically:  which parts do I cache and what's the
best way to identify those parts?  I have a feeling I'm going to have
to package a group of request/responses to cache it in it's
entirety,... or something.   --or maybe this is not feasible within
the given framework.

I essentially need a better understanding of the client/server
interaction process so I can formulate an approach to achieve
remote-end caching of search queries (in Perl of course, since that's
what's being used here).


Comments?

thanks

Reply via email to