"Eric Jain" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > If you *really* don't want to (or can't) put all the searchable fields > > into lucene, then you are going to need to do a "lucene-db" join. > > Here are two good reasons: > > 1. Range queries > 2. Sorting > > Yes, Lucene can do both, but I find that in both cases the approach > Lucene uses is not suitable for large data sets, given limited hardware > resources. > > > > Hits hits = searcher.search(new TermQuery("text", "foo") > > Set hitPKs = new Set(); > > for each doc in hits: > > hitPKs.put(doc.getField("pk")) > > Retrieving even one custom field for every document of a possibly large > data set > can end up being very slow, it seems. This complicates things a lot... > > Unfortunately, I am not aware of any good solutions for combining Lucene > with a relational database, given the requirements listed above. > However, one promising approach may involve combing Lucene with the new > Berkely DB JE: > > 1. Use Lucene to create a bitset of results (position = docid). > 2. Use BDB to iterate through primary keys, sorted and restricted by one > (or more?) of several criteria. > 3. For each primary key, look up docid (this database must be rebuilt > every time the index is modified). > 4. If docid set in result bitset, report result. > > If anyone has tried anything similar, I'd be interested to know!
Why Berkely DB? This sounds like it would work regardless of the database. Regards, Glen --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
