[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Rick Hillegas (JIRA) Mon, 21 Oct 2013 07:39:12 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800696#comment-13800696
 ]


Rick Hillegas commented on DERBY-590:
-------------------------------------

Thanks again for working on this, Andrew. I noticed that lucene_titles.sql 
invokes a procedure called LuceneSupport.indexDatabase(). I can't find that 
procedure in lucene_demo.diff. Where should I look for that procedure?

Here's my crude interpretation of what the code is doing: The tool makes it 
possible to do full-text search on data which is stored in the text columns of 
Derby tables. The tables must have unique Derby indexes. Lucene itself relies 
on indexes which it builds and stores outside Derby in the file system. Over 
time, the Lucene indexes drift out of sync with the text data. The application 
periodically asks Derby to update specific Lucene indexes, bringing them back 
into sync with the text data. 

Loading the tool via syscs_register_tool() creates the following schema objects:

a) LuceneSupport.indexTable() - This procedure indexes a text column in a Derby 
table.

b) LuceneSupport.luceneUpdateDocument() - This procedure updates a Lucene index 
which was created by the previous procedure, bringing the Lucene index back 
into sync with the text data.

c) LuceneSupport.luceneQuery() - This is a table function for running a 
full-text search against a Derby column.

As is, this sounds like a very useful piece of functionality. We could make 
this production-ready incrementally and document it at the end of that effort. 
At a minimum, we would want to:

i) Quibble a bit about the api, the names of schema objects, and where the code 
goes.

ii) Add comments to the code.

iii) Think about edge cases. For example, what happens if the Lucene indexes 
become corrupt or are deleted? How do we keep track of which columns are 
indexed? What happens when Derby is recovered from a backup or the database is 
recreated?

iv) Write tests.

Some follow-on efforts might also make sense:

1) We could consider moving the Lucene indexes inside the database.

2) Maybe we could add triggers on the indexed columns so that the Lucene 
indexes remain in sync with the Derby data. Don't know how much of a 
performance drag that would be. Maybe this could be an optional feature of 
creating a Lucene index.

3) Replace the procedure calls with explicit CREATE FULLTEXT (and maybe UPDATE 
FULLTEXT) statements. This would be an opportunity to think about how we could 
load and unload optional Derby statements.

Thanks!
-Rick


> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: lucene_demo.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Reply via email to