searching multiple remote indices

Christian Reuschling Wed, 18 Jun 2014 08:11:37 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
we currently migrate from Lucene 3.5.0 to Lucene 4. So far so good, but in one 
project we have the
need to access multiple indices, that can be also remote ones. In the past, we 
solved this by
using the Searcher interface, and implemented a subclass of it that makes 
remote calls to some
according server instance. With MultiSearcher, it was easy to mix local and 
remote indices then,
having one transparent Searcher instance that enables distributed search.


In Lucene 4, Searcher and MultiSearcher are removed now. The recommended 
solution for this is to
use MultiIndexReader to aggregate the indices, and build an IndexSearcher out 
of this. This is
fine for local indices, but we are wondering about how to proceed with remote 
ones.
Subclassing IndexSearcher is no solution anymore, since there is no 
MultiSearcher for aggregation.
Subclassing IndexReader would maybe work, but there are some methods declared 
as final
(document(int), etc), and we are not sure if this can work only with the 
non-final methods.
Further, we are not sure if there will be performance issues with remote 
IndexReader proxy
objects, because potentially there must be transported a plenty of information 
over the wire
during a search - even more as by aggregating search results of e.g. length 20.

Another idea we have is to implement a remote call Directory subclass. But 
still, we are not sure
if this is a feasible way to do. This would solve the final method problem, but 
has maybe similar
performance issues, if this is critical.

Because we are in a migration process and don't implement something from 
scratch, also some use of
different techniques as switching to a Solr backend are not a way to go for us. 
On the other hand,
maybe there are some Solr classes for distributed search we could also use 
instead of MultiSearcher.

Simply aggregate the search result lists and write an own simple class with a 
search(..) method is
also not enough, since we use some more searcher functionality, which also have 
to be aggregated
then, namely:
- - createNormalizedWeight(query) - called by someQuery.weight(searcher)
- - rewrite(query) - to get the atom queries. Was implemented with 
query.combine(query) in
MultiSearcher which is also not available anymore.


Does somebody have some best practices? From our impression, it sounds not like 
an exotic case. Or
is it?


Thanks from the whole DFKI Lucene crew!

Christian


- -- 
______________________________________________________________________________
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Trippstadter Straße 122, D-67663 Kaiserslautern, Germany

Phone: +49.631.20575-1250
mailto:reuschl...@dfki.de  http://www.dfki.uni-kl.de/~reuschling/

- ------------Legal Company Information Required by German Law------------------
Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
                  Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313=
______________________________________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlOhq9YACgkQ6EqMXq+WZg+SrwCfckWZIfyysjxWSTRY3WQN/MeG
blcAoIQsQFJ5zb/9DMjUIYf/tidEaoJ3
=xqVj
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

searching multiple remote indices

Reply via email to