[
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Hillegas updated DERBY-590:
--------------------------------
Attachment: derby-590-01-ag-publicAccessToLuceneRoutines.diff
Attaching derby-590-01-ag-publicAccessToLuceneRoutines.diff. This patch builds
on Andrew's approach, wiring Lucene support into Derby's SQL authorization
scheme. If this approach looks promising, I will write additional regression
tests to verify the claims I make about this solution.
Before discussing what this patch does, let me recap what it does NOT do:
1) It does NOT encrypt Lucene indexes if the database is encrypted.
2) It does NOT work with in-memory databases.
3) It does NOT work with backup/restore.
In addition, I have changed the meaning of the updateIndex() procedure and lost
the very useful meaning which it used to have. There are at least two important
application profiles which have different needs from an index-updating
procedure:
i) Update-intensive: These applications do not want to incur the performance
hit of re-indexing a document every time it changes. These applications may
work best with a bulk-reindexing cron job which executes when user activity is
low. This is the use-case supported by the revised updateIndex() procedure: the
Lucene index is dropped and completely recreated although schema objects
connected to it are not bounced.
ii) Read-intensive: These applications can incur the performance hit of
re-indexing a document every time it changes. These applications are well
served by the trigger-driven usage of updateIndex() provided in Andrew's
original patch.
I think that it may be possible to recover Andrew's original functionality. But
this requires more familiarity with Lucene than I can claim. Maybe Andrew can
propose some ideas and we can converge on a solution.
So much for the caveats. Here are the changes made by the new patch:
A) The layout of the Lucene support directory is changed. There is now a
subdirectory for each schema which has indexed tables in it. Under that
directory, there is a subdirectory for each table which has indexed columns.
Under the table-specific directory, there is a subdirectory for each text
column which is indexed. Those are the leaf directories which contain the
actual Lucene files. So the directory structure looks like this:
{noformat}
databaseName
seg0
log
lucene
SCHEMA1
TABLE1
TEXT_COLUMN1
TEXT_COLUMN2
...
{noformat}
B) The luceneQuery() table function has been changed to be context aware. See
the work recently done on https://issues.apache.org/jira/browse/DERBY-6117.
Users no longer invoke luceneQuery() directly. Instead, for each indexed
column, we create a table function specific to that column. The table function
has the name $schemaName.$tableName__$columnName. This is the crucial change
which allows us to wire Lucene support up to Derby's SQL authorization. The
schema owner can grant EXECUTE privilege on the column-specific table function.
In addition, the underlying context-aware luceneQuery() machinery starts out by
issuing a select against the text column and the key columns of the table. This
ensures that users must enjoy SELECT privilege on all of those columns in order
to run the column-specific table function.
C) A text column can be indexed only if the table has a primary key. The
primary key can have multiple columns in it and the columns can be any
indexable datatype. The column-specific table function returns a data set which
includes the whole primary key plus the Lucene document id plus the rank number
(the relevance of that document according to Lucene's calculations). This makes
it easy to join the table function to the original table in order to obtain
more extensive results.
D) The arguments to the Lucene support procedures are now case-insensitive SQL
identifiers rather than case-sensitive strings. This is a departure from the
convention followed by most Derby system procedures, but I think the departure
is welcome and long overdue.
E) The Lucene support has been moved out of the tools jar and into the engine
jar. This fixes the big surprise I recorded on
https://issues.apache.org/jira/browse/DERBY-6470. This change also made it
possible to use IdUtil in order to implement the semantics described above in
D).
F) It is assumed that the Lucene jars will be checked into the Derby source
tree and that the Lucene support classes will always be built. However, there
is still no plan to include Lucene jars in Derby binary distributions. Users
who want to enable the optional Lucene support will have to install the Lucene
jars themselves.
G) In general, the following convention has been followed for each supported
operation: Transactional writes are performed before any calls are made to
Lucene. This means that if the Lucene calls raise an error, then the
transactional writes are rolled back. I think that following a consistent
convention like this will make it easier for users to reason about how Lucene
support behaves.
H) Miscellaneous improvements have been made in the areas of code factoring and
integration with the Java security manager.
Here's how you use the revised tool:
Let's say that you have a table with this shape...
{noformat}
create table lucenetest.titles
(
ID int generated always as identity primary key,
ISBN varchar(16),
PRINTISBN varchar(16),
title varchar(1024),
subtitle varchar(1024),
author varchar(1024),
series varchar(1024),
publisher varchar(1024),
collections varchar(128),
collections2 varchar(128)
);
{noformat}
The DBO loads the Lucene support tool as follows...
{noformat}
call syscs_util.syscs_register_tool( 'luceneSupport', true );
{noformat}
This creates the LuceneSupport schema, containing the following procedures and
functions...
{noformat}
create procedure LuceneSupport.createIndex
(
schemaname varchar( 128 ),
tablename varchar( 128 ),
textcolumn varchar( 128 )
)
parameter style java modifies sql data language java
external name 'org.apache.derby.impl.optional.lucene.LuceneSupport.createIndex';
create procedure LuceneSupport.dropIndex
(
schemaname varchar( 128 ),
tablename varchar( 128 ),
textcolumn varchar( 128 )
)
parameter style java modifies sql data language java
external name 'org.apache.derby.impl.optional.lucene.LuceneSupport.dropIndex';
create procedure LuceneSupport.updateIndex
(
schemaname varchar( 128 ),
tablename varchar( 128 ),
textcolumn varchar( 128 )
)
parameter style java reads sql data language java
external name 'org.apache.derby.impl.optional.lucene.LuceneSupport.updateIndex';
create function LuceneSupport.listIndexes() returns table
(
id int,
schemaname char( 128 ),
tablename char( 128 ),
columnname char( 128 ),
lastupdated timestamp
)
language java parameter style DERBY_JDBC_RESULT_SET contains sql
external name 'org.apache.derby.impl.optional.lucene.LuceneSupport.listIndexes'
{noformat}
The LUCENETEST user can then index a text column as follows. The DBO can do
this too. However, no-one else can index the column. That is because the
createIndex() procedure attempts to create a table function in the lucenetest
schema. Other users do not enjoy that privilege...
{noformat}
call LuceneSupport.createIndex( 'lucenetest', 'titles', 'title' );
{noformat}
...which creates the following directory structure...
{noformat}
databaseName
seg0
log
lucene
LUCENETEST
TITLES
TITLE
...
{noformat}
...and the following column-specific table function:
{noformat}
create function lucenetest.titles__title( query varchar( 32672 ), rankCutoff
double )
returns table
(
ID int,
documentID int,
rank double
)
language java parameter style derby_jdbc_result_set contains sql
external name 'org.apache.derby.impl.optional.lucene.LuceneSupport.luceneQuery';
{noformat}
The LUCENETEST user can then query the Lucene index and join it to the original
table as follows...
{noformat}
select title, author, publisher, documentID, rank
from lucenetest.titles t, table ( lucenetest.titles__title( 'grapes', 0 ) ) l
where t.id = l.id;
{noformat}
The LUCENETEST user (or the DBO but no-one else) can drop the Lucene index as
follows...
{noformat}
call LuceneSupport.dropIndex( 'lucenetest', 'titles','title' );
{noformat}
...which drops the lucene/LUCENETEST/TITLES/TITLE directory. Directory deletion
cascades up. That is, if there are no more indexed columns in
lucenetest.titles, then we also drop lucene/LUCENETEST/TITLES. And we drop
lucene/LUCENETEST if there are no more indexed tables in the lucenetest schema.
The DBO (and only the DBO) can unload Lucene support as follows...
{noformat}
call syscs_util.syscs_register_tool( 'luceneSupport', false );
{noformat}
...which drops all procedures and table functions created by the tool. This
command also drops the lucene subdirectory and everything under it.
Touches the following files:
--------------------
M java/engine/org/apache/derby/impl/sql/compile/StaticMethodCallNode.java
Adds the Lucene support package to the list of Derby packages whose entry
points can be bound to user-defined routines.
--------------------
A java/engine/org/apache/derby/impl/optional
A java/engine/org/apache/derby/impl/optional/lucene
A java/engine/org/apache/derby/impl/optional/lucene/LuceneQueryVTI.java
A java/engine/org/apache/derby/impl/optional/lucene/LuceneSupport.java
A
java/engine/org/apache/derby/impl/optional/lucene/LuceneListIndexesVTI.java
A java/engine/org/apache/derby/impl/optional/lucene/build.xml
M java/engine/org/apache/derby/impl/build.xml
M java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
The Lucene support optional tool.
--------------------
M java/engine/org/apache/derby/vti/VTITemplate.java
Some additional support for context-aware table functions. This support allows
the function to query the JDBC metadata for the shape of the ResultSet it
returns.
--------------------
M java/engine/org/apache/derby/loc/messages.xml
M java/shared/org/apache/derby/shared/common/reference/SQLState.java
New error messages for Lucene support.
--------------------
M build.xml
A tools/java/lucene-analyzers-common-4.5.0.jar
A tools/java/lucene-queryparser-4.5.0.jar
A tools/java/lucene-core-4.5.0.jar
M tools/ant/properties/extrapath.properties
M tools/jar/extraDBMSclasses.properties
Miscellaneous build machinery.
--------------------
A tools/release/notices/lucene.txt
The Lucene notice file which must be included in the Derby NOTICE file if we
are to ship the lucene jars in Derby source distributions.
--------------------
M
java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java
A
java/testing/org/apache/derbyTesting/functionTests/tests/lang/LuceneSupportTest.java
Initial tests.
> How to integrate Derby with Lucene API?
> ---------------------------------------
>
> Key: DERBY-590
> URL: https://issues.apache.org/jira/browse/DERBY-590
> Project: Derby
> Issue Type: Improvement
> Components: Documentation, SQL
> Reporter: Abhijeet Mahesh
> Labels: derby_triage10_11
> Attachments: derby-590-01-ag-publicAccessToLuceneRoutines.diff,
> lucene_demo.diff, lucene_demo_2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)