Geoff hendrey wrote:
Interesting. I am reviewing the JavaDocs.
I am concerned that transactional integrity might not be an important
requirement. That probably will raise the hackles of database experts
like the Derby team, but I would prefer a non-transactional support
for Lucene-derby integration.
Thanks, Geoff. I think that a successful feature needs a real user like you.
Please consider the following things I would like:
1. ability to search an entire *row* as a document, not a
column-as-document model
OK. This makes sense to me. Each column would be a Lucene field, if I am
understanding the terms correctly.
1. ability to "look back in time" and see old versions of rows
This is interesting. What would you like the key to be? A two part
invention composed of the table's primary key plus a timestamp?
Something else?
What about aborted changes? Would you be happy with a solution which
recorded index entries for changes which were subsequently discarded
because, say, an INSERT integrity violation rolled back a transaction to
the savepoint laid down before the INSERT ran?
Thanks,
-Rick
1. very high performance
2. high quality search results (from my experience you must combine
FuzzyLikeThisQuery with SnowballAnalyzer).
In my view, the lucene integration is more like a system that indexes
a constant stream of information that enters the database.
Transactions are nice-to-have, but not really needed to achieve the
equivalent of a "search engine for the database". When I execute a
Lucene search, all I need to get back are row id's that I can use to
lazy-retrieve the row if the user wants to drill down on a particular
search result. With a web search engine like Google, it is possible
that the page may no longer exist when the user clicks on the search
result (it happens from time to time).
This is why I don't think we really *need* transactional integrity on
the lucene search.
-geoff
“XML? Too much like HTML. It'll never work on the Web!”
-anonymous
*From:* Rick Hillegas <[email protected]>
*To:* Derby Discussion <[email protected]>
*Sent:* Tuesday, March 24, 2009 11:55:38 AM
*Subject:* Re: Lucene integration
Hi Geoffrey,
I'm hoping to have some time to look at Lucene integration after we
put 10.5 to bed. In the meantime, I was wondering if you have any
experience with implementations of Lucene Directory which place the
Lucene indexes inside a relational database? According to the
following link, people have been disappointed with the performance of
this approach (don't know what that means)--at first blush, however,
the approach seems like an attractive way to keep the Lucene indexes
transactionally consistent with the original character data:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-e55d8e6971f9f01daaf3e14ce1d2f34485adba6e
Thanks,
-Rick
Rick Hillegas wrote:
> Hi Geoffrey,
>
> I'm on the road right now but I'd like to make some suggestions
after I gather my thoughts and get over my jet lag. I think that it is
definitely possible to hook into the query processing layer in order
to fork the tuple stream so that a listener process can populate the
Lucene indexes. I think that scraping the replication log stream would
raise a lot of issues around when work is really committed vs. when
savepoints are rolled back, and I would recommend against that approach.
>
> Regards,
> -Rick
>
> Geoffrey Hendrey wrote:
>> Ok, well on to plan B then. Is there some stage in the preparation
of inserts, updates, and deletes at which the logical identity of a
row is established? That could be a good place to provide a lucene
hook, or a more general interceptor.
>>
>>
>> On Mar 18, 2009, at 6:55 AM, Jørgen Løland <[email protected]
<mailto:[email protected]>> wrote:
>>
>> Geoff hendrey wrote:
>> I've been folowing knuts pointers and reading the docs on the
classes that marshal themselves over the wire via their writeObject
method.
>> So, question about this:
>> "Type=update, Table=employee, Page=4321, Index=4, field 3=50000"
>> Does the page and index, collectively, constitute a "row ID".
>> If it is always a constant, than these three field are sufficient
to permanently identify the row, and we can use that information to
consititute a document ID in lucene.
>>
>> It's constant until the record is moved to another page (which
means "no", really).
>>
>>
>