Glen Stampoultzis wrote:
Anyone have any strategies for dealing with this?  I'm wondering whether
it's better to replicate searchable fields in the lucene index.  This means
being very careful that updates get done in two places so it is not ideal.

If you *can* manage to update your index when the "real" database is updaed, then indexing the database fields into the lucene index is the way to go: because it makes searching easier (no lucene-db joins) and faster (way faster!).

You could always "batch" the updates of the index. That is, if you can
put a "last modified" date on each row of the database, then updating
the lucene index is easy (the index just needs to remember the last date
at which it was updated).

If you *really* don't want to (or can't) put all the searchable fields
into lucene, then you are going to need to do a "lucene-db" join. This
is how I do it:

Just say your DB contains rows with a primary key "pk" (say, an int;
this is a Keyword in the lucene index), a field "author" (not indexed)
and a text blob "text" (this is what you have indexed in Lucene).

There are two basic joins you need to do, an "and" join and an "or" join.

"And" join: if you want to do a query like "select rows where
name='matt' and text contains 'foo'", then the psuedo code is:

---
Hits hits = searcher.search(new TermQuery("text", "foo")
Set hitPKs = new Set();
for each doc in hits:
  hitPKs.put(doc.getField("pk"))

rsPKs = new Set();
ResultSet rs = sql.execute("select PK from TABLE where name='matt'")
for each row in rs:
  rsPKs.put(rs.get("pk"));

Set results = intersection(rsPKs, hitPKs);
---

Similarly, the "or" query uses union() not intersection(). If you want
to preserve the order of the Hits, then use a List or LinkedSet instead.

The code for a lucene-db is very annoying: try to put all your
searchable fields into an Index instead :D

=Matt





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to