Re: Lucene integration

Geoff hendrey Wed, 25 Mar 2009 07:37:48 -0700

Interesting. I am reviewing the JavaDocs.

I am concerned that transactional integrity might not be an important 
requirement. That probably will raise the hackles of database experts like the 
Derby team, but I would prefer a non-transactional support for Lucene-derby 
integration.


Please consider the following things I would like:

        1. ability to search an entire *row* as a document, not a 
column-as-document model
        2. ability to "look back in time" and see old versions of rows
        3. very high performance
        4. high quality search results (from my experience you must combine 
FuzzyLikeThisQuery with SnowballAnalyzer).
In my view, the lucene integration is more like a system that indexes a 
constant stream of information that enters the database. Transactions are 
nice-to-have, but not really needed to achieve the equivalent of a "search 
engine for the database". When I execute a Lucene search, all I need to get 
back are row id's that I can use to lazy-retrieve the row if the user wants to 
drill down on a particular search result. With a web search engine like Google, 
it is possible that the page may no longer exist when the user clicks on the 
search result (it happens from time to time).

This is why I don't think we really *need* transactional integrity on the 
lucene search.

 -geoff
“XML? Too much like HTML. It'll never work on the Web!” 
-anonymous 





________________________________
From: Rick Hillegas <[email protected]>
To: Derby Discussion <[email protected]>
Sent: Tuesday, March 24, 2009 11:55:38 AM
Subject: Re: Lucene integration

Hi Geoffrey,

I'm hoping to have some time to look at Lucene integration after we put 10.5 to 
bed. In the meantime, I was wondering if you have any experience with 
implementations of Lucene Directory which place the Lucene indexes inside a 
relational database? According to the following link, people have been 
disappointed with the performance of this approach (don't know what that 
means)--at first blush, however, the approach seems like an attractive way to 
keep the Lucene indexes transactionally consistent with the original character 
data:

http://wiki.apache.org/lucene-java/LuceneFAQ#head-e55d8e6971f9f01daaf3e14ce1d2f34485adba6e

Thanks,
-Rick

Rick Hillegas wrote:
> Hi Geoffrey,
> 
> I'm on the road right now but I'd like to make some suggestions after I 
> gather my thoughts and get over my jet lag. I think that it is definitely 
> possible to hook into the query processing layer in order to fork the tuple 
> stream so that a listener process can populate the Lucene indexes. I think 
> that scraping the replication log stream would raise a lot of issues around 
> when work is really committed vs. when savepoints are rolled back, and I 
> would recommend against that approach.
> 
> Regards,
> -Rick
> 
> Geoffrey Hendrey wrote:
>> Ok, well on to plan B then. Is there some stage in the preparation of 
>> inserts, updates, and deletes at which the logical identity of a row is 
>> established? That could be a good place to provide a lucene hook, or a more 
>> general interceptor.
>> 
>> 
>> On Mar 18, 2009, at 6:55 AM, Jørgen Løland <[email protected]> wrote:
>> 
>> Geoff hendrey wrote:
>> I've been folowing knuts pointers and reading the docs on the classes that 
>> marshal themselves over the wire via their writeObject method.
>> So, question about this:
>> "Type=update, Table=employee, Page=4321, Index=4, field 3=50000"
>> Does the page and index, collectively, constitute a "row ID".
>> If it is always a constant, than these three field are sufficient to 
>> permanently identify the row, and we can use that information to consititute 
>> a document ID in lucene.
>> 
>> It's constant until the record is moved to another page (which means "no", 
>> really).
>> 
>>  
>

Re: Lucene integration

Reply via email to