[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Rick Hillegas (JIRA) Mon, 04 Nov 2013 11:08:46 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813121#comment-13813121
 ]


Rick Hillegas commented on DERBY-590:
-------------------------------------

Thanks for the second rev of the patch, Andrew. I have a couple comments under 
the following topics:

o Build
o Localization
o Security
o Misc

I applied the patch, copied the Lucene jar files into tools/java, and 
successfully ran LuceneSupportTest. In addition, I ran some more experiments 
with a script attached at the end of this comment.

Thanks,
-Rick

-------------- BUILD ----------------

This optional tool relies on the following Lucene jar files. The user 
documentation should mention that these are necessary in order to use this tool:

A       tools/java/lucene-analyzers-common-4.5.0.jar
A       tools/java/lucene-queryparser-4.5.0.jar
A       tools/java/lucene-core-4.5.0.jar

I see that you intend to check the Lucene jar files into the Derby source tree 
so that this code is always built and tested--even though I also see that you 
can build Derby cleanly without the Lucene jar files. I agree with your 
decision. We have spent considerable effort making it possible for all 
developers to build and test the entire Derby product, and I'd like to stay 
with that model. I understand that the Lucene jar files are big. They are an 
order of magnitude larger than any other jar files checked into Derby. But I 
don't think that including them in our source distributions is burdensome.

I'm not clear on Lucene's compatibility guarantees:

B1) Is there some reason that you chose jar files from Lucene 4.5.0 rather than 
the latest 4.5.1 release?

B2) Our documentation should state which versions of Lucene we support. We 
should make sure that we verify those claims during release testing.



-------------- LOCALIZATION ----------------

L1) LuceneListIndexesVTI and LuceneQueryVTI raise SQLExceptions with hard-coded 
error strings. These should probably be replaced with localizable error 
strings. To see how to do this with optional tools, you can look for the error 
handle OT_BadLoadUnloadArgs in ForeignDBViews.

L2) In addition, when LuceneQueryVTI.getRawColumn() catches an IOException, it 
should use that IOException as the cause of the SQLException that's thrown 
afterward.

-------------- SECURITY ----------------

I think that this tool should run comfortably when a Java security manager is 
installed. I recommend the following changes:

S1) LuceneSupport.dropIndex() should do its File operations inside privilege 
blocks.

S2)  LuceneSupportTest should not disable the security manager.


We need to give some thought to the SQL permissions needed to use this Lucene 
support. I have observed the following:

S3) Only the DBO can load/unload this tool. That seems fine to me.

S4) Ordinary users don't have permission to create/drop indexes. Is that too 
restrictive? It would be friendlier if anyone could create/drop indexes on any 
column they own. That is, on any column in a table in a schema that they own.

S5) Ordinary users don't have permission to use the luceneQuery() table 
function.

S6) ... but if EXECUTE privilege on luceneQuery() is granted to PUBLIC, then 
people can view any indexed data. This seems overbroad. If Fred doesn't have 
SELECT privilege on lucenetest.titles.title, then Fred shouldn't be able to see 
the contents of that column by running luceneQuery(). A poor man's solution to 
this would involve making luceneQuery() issue a dummy select against the 
indexed column first, just to make sure that the user has permission to see 
that data.

S7) I have some misgivings about how this Lucene support will play with 
encrypted databases. The conglomerates will be encrypted but a great deal of 
information will leak out if the Lucene indexes are not encrypted. At a 
minimum, we should file a follow-on JIRA to collect thoughts/solutions related 
to this vulnerability. If we don't address this vulnerability, then we should 
at least describe it in our user documentation.

-------------- MISC ----------------

M1) LuceneListIndexesVTI and  LuceneSupportTest need Apache license headers.

M2) LuceneQueryVTI imports RestrictedVTI but doesn't implement it. Are you 
planning to make LuceneQueryVTI a RestrictedVTI in a future rev?

M3) I like the isolation of the Lucene indexes in a separate "lucene" 
subdirectory of the database. I see that createIndex() creates a subdirectory 
with a name made out of the schema, table, and column arguments. There are 
probably some tricky edge cases involving delimited identifiers and maybe some 
security vulnerabilities involved in exposing these names. The edge cases and 
hypothetical security issues might be eliminated by giving these subdirectories 
less friendly names like $tableID_$columnNumber.


-------------- TEST SCRIPT ----------------

connect 'jdbc:derby:db1;create=true;user=dbo';

call syscs_util.syscs_create_user( 'DBO', 'dbo_password' );
call syscs_util.syscs_create_user( 'LUCENETEST', 'lucenetest_password' );
call syscs_util.syscs_create_user( 'FRED', 'fred_password' );

-- shutdown in order to enable NATIVE authentication
connect 'jdbc:derby:db1;shutdown=true';

connect 'jdbc:derby:db1;user=lucenetest;password=lucenetest_password' as 
lucenetest;

create table titles (ID int generated always as identity, ISBN varchar(16), 
PRINTISBN varchar(16), title varchar(1024), subtitle varchar(1024), author 
varchar(1024), series varchar(1024), publisher varchar(1024), collections 
varchar(128), collections2 varchar(128));

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, 
PUBLISHER, COLLECTIONS, COLLECTIONS2) values 
('9765087650324','9765087650324','The Grapes Of Wrath','The Great Depression in 
Oklahoma','John Steinbeck','Noble Winners','The Viking Press','National Book 
Award','Pulitzer Prize');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, 
PUBLISHER, COLLECTIONS, COLLECTIONS2) values 
('6754278542987','6754278542987','Identical: Portraits of Twins','Best Photo 
Book 2012 by American Photo Magazine','Martin 
Schoeller','Portraits','teNeues','Photography','');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, 
PUBLISHER, COLLECTIONS, COLLECTIONS2) values 
('2747583475882','2747583475882','Vines, Grapes, and Wines','The wine 
drinker''s guide to grape varieties','Jancis Robinson','Reference','Alfred A. 
Knopf','Wine','');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, 
PUBLISHER, COLLECTIONS, COLLECTIONS2) values 
('4356123483483','4356123483483','A Tale of Two Cities','A fictional account of 
events leading up to the French revolution','Charles 
Dickens','Classics','Chapman & Hall','Fiction','Social Criticism');

-- as expected, this user doesn't have permission to load the tool
call syscs_util.syscs_register_tool('luceneSupport',true);

connect 'jdbc:derby:db1;user=dbo;password=dbo_password' as dbo;

-- succeeds because dbo has permission to load optional tools
call syscs_util.syscs_register_tool('luceneSupport',true);

set connection lucenetest;

-- fails due to lack of permission. probably shouldn't fail.
call LuceneSupport.createIndex('lucenetest','titles','title','id');

set connection dbo;

-- succeeds
call LuceneSupport.createIndex('lucenetest','titles','title','id');

set connection lucenetest;

-- fails due to lack of permission. probably shouldn't fail.
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 
'title', 0) ) luceneResults;


set connection dbo;

-- succeeds
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 
'title', 0) ) luceneResults;

grant execute on function LuceneSupport.luceneQuery to public;

set connection lucenetest;

-- works now, after dbo lets everyone run the luceneQuery table function
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 
'title', 0) ) luceneResults;

connect 'jdbc:derby:db1;user=fred;password=fred_password' as dbo;

-- it's unfortunate that fred can view data in a table he doesn't have SELECT 
access to
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 
'title', 0) ) luceneResults;

-- as expected, the following query fails due to lack of permission
select title from lucenetest.titles where 1=2;

set connection dbo;

select id, schemaname, tablename, columnname from table ( 
LuceneSupport.listIndexes('', '') ) listindexes;


> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: lucene_demo.diff, lucene_demo_2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Reply via email to