[
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800696#comment-13800696
]
Rick Hillegas commented on DERBY-590:
-------------------------------------
Thanks again for working on this, Andrew. I noticed that lucene_titles.sql
invokes a procedure called LuceneSupport.indexDatabase(). I can't find that
procedure in lucene_demo.diff. Where should I look for that procedure?
Here's my crude interpretation of what the code is doing: The tool makes it
possible to do full-text search on data which is stored in the text columns of
Derby tables. The tables must have unique Derby indexes. Lucene itself relies
on indexes which it builds and stores outside Derby in the file system. Over
time, the Lucene indexes drift out of sync with the text data. The application
periodically asks Derby to update specific Lucene indexes, bringing them back
into sync with the text data.
Loading the tool via syscs_register_tool() creates the following schema objects:
a) LuceneSupport.indexTable() - This procedure indexes a text column in a Derby
table.
b) LuceneSupport.luceneUpdateDocument() - This procedure updates a Lucene index
which was created by the previous procedure, bringing the Lucene index back
into sync with the text data.
c) LuceneSupport.luceneQuery() - This is a table function for running a
full-text search against a Derby column.
As is, this sounds like a very useful piece of functionality. We could make
this production-ready incrementally and document it at the end of that effort.
At a minimum, we would want to:
i) Quibble a bit about the api, the names of schema objects, and where the code
goes.
ii) Add comments to the code.
iii) Think about edge cases. For example, what happens if the Lucene indexes
become corrupt or are deleted? How do we keep track of which columns are
indexed? What happens when Derby is recovered from a backup or the database is
recreated?
iv) Write tests.
Some follow-on efforts might also make sense:
1) We could consider moving the Lucene indexes inside the database.
2) Maybe we could add triggers on the indexed columns so that the Lucene
indexes remain in sync with the Derby data. Don't know how much of a
performance drag that would be. Maybe this could be an optional feature of
creating a Lucene index.
3) Replace the procedure calls with explicit CREATE FULLTEXT (and maybe UPDATE
FULLTEXT) statements. This would be an opportunity to think about how we could
load and unload optional Derby statements.
Thanks!
-Rick
> How to integrate Derby with Lucene API?
> ---------------------------------------
>
> Key: DERBY-590
> URL: https://issues.apache.org/jira/browse/DERBY-590
> Project: Derby
> Issue Type: Improvement
> Components: Documentation, SQL
> Reporter: Abhijeet Mahesh
> Labels: derby_triage10_11
> Attachments: lucene_demo.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken?
--
This message was sent by Atlassian JIRA
(v6.1#6144)