Hi,
I'm considering changing our established caching mechanism to allow
for more nimble cache refreshing (ie, when the backend indexes change
beyond threshold X). Instead of caching using our reverse-proxy
cluster, I'd like to cache the $response on each remote searcher node.
My idea is to splice into LucyX/Remote/SearchServer.pm's sub serve():
# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
my $response = $dispatch{$method}->( $self, thaw($buf) );
my $frozen = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;
becomes,
# Process the method call.
read( $client_sock, $buf, 4 );
$len = unpack( 'N', $buf );
read( $client_sock, $buf, $len );
#---------incision start----------
my $response;
my $cached_object_id = md5sum($buf); # TODO: check if $buf is the search string
if (is_cached($cached_object_id)) {
$response = read_cached_object($cached_object_id);
}
else {
$response = $dispatch{$method}->( $self, thaw($buf) );
}
#---------incision end----------
my $frozen = nfreeze($response);
my $packed_len = pack( 'N', bytes::length($frozen) );
print $client_sock $packed_len . $frozen;
....
I seem to recall though that the typical search is not an atomic
transaction: ie, the remote search protocol is broken up into
discrete request/response chunks:
my $hits = $poly_searcher->hits(
query => $parsed_query,
sort_spec => $sort_spec,
offset => 0, # or 10, 20, etc
num_wanted => 10,
);
is processed roughly as:
doc_max/response
doc_freq/response x 31
...
top_docs/response
fetch_doc/response x 10
...
done
So, my question is basically: which parts do I cache and what's the
best way to identify those parts? I have a feeling I'm going to have
to package a group of request/responses to cache it in it's
entirety,... or something. --or maybe this is not feasible within
the given framework.
I essentially need a better understanding of the client/server
interaction process so I can formulate an approach to achieve
remote-end caching of search queries (in Perl of course, since that's
what's being used here).
Comments?
thanks