[jira] [Updated] (DERBY-590) How to integrate Derby with Lucene API?

Knut Anders Hatlen (JIRA) Fri, 06 Jun 2014 04:26:20 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Knut Anders Hatlen updated DERBY-590:
-------------------------------------

    Attachment: multifield.diff

Thanks, Rick. Those were the exact changes that were needed.

The attached patch [^multifield.diff] shows an example of how it could be used.

I made two small adjustments:

1) Instead of hard-coding the field names, I made LuceneSupport read them 
dynamically from a database property (derby.tests.lucene.fields), so that I 
could verify that the original Lucene tests still pass. (They do still pass, by 
the way.) Also the field names are stored in the Lucene index property file, so 
that LuceneQueryVTI can find them too. This is of course just a temporary hack 
until we figure out the correct API.

2) I made LuceneUtils.defaultQueryParser() always return a 
MultiFieldQueryParser, since MultiFieldQueryParser seems to behave just like 
QueryParser in the degenerate case with a single field.

Since I didn't feel like writing a Java source file parser, I changed my 
example use case to search in XML files, so that I could use the XML parser 
that is in the JRE. I added a test case to LuceneSupportTest to verify that it 
could be used for that.

The test case creates an index with two fields: tags and text. The tags field 
contains only the XML tags, whereas the text field contains only the text 
elements of the XML file. This way, you can use the index to search for data 
and metadata separately in the XML documents stored in your table.

Now, while writing the test case, I found that you will most likely want to use 
a custom query parser when you use it this way. The reason is that the default 
query parser uses the same analyzer as the index writer used to extract tokens 
from the search terms. That means, if you like in this case use a custom 
analyzer that parser XML documents, the query parser will also expect the terms 
in the query to be XML documents. So you'll end up with rather silly-looking 
queries.

For example, to search for documents that contain the text "abc", you cannot 
make the query {{text:"abc"}}, but have to wrap it in dummy XML tags to make it 
parsable {{text:"<dummy>abc</dummy>"}}.

The custom query parser doesn't need to be very complex, though. The test case 
in the patch shows one example in the method {{createXMLQueryParser()}}. That 
method simply creates a MultiFieldQueryParser with a plain StandardAnalyzer. 
With that parser, you can write queries like:

- {{text:abc}} to search for "abc" in the text elements of the XML

- {{tags:abc}} to search for XML tags called "abc"

- {{abc}} to search for "abc" in both text elements and tags

What do you think? Does it sound like a useful addition?

> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: LucenePlugin.html, LucenePlugin.html, LucenePlugin.html, 
> derby-590-01-ag-publicAccessToLuceneRoutines.diff, 
> derby-590-01-ah-publicAccessToLuceneRoutines.diff, 
> derby-590-01-am-publicAccessToLuceneRoutines.diff, 
> derby-590-02-aa-cleanupFindbugsErrors.diff, 
> derby-590-03-aa-removeTestingDiagnostic.diff, 
> derby-590-04-aa-removeIDFromListIndexes.diff, 
> derby-590-05-aa-accessDeclaredMembers.diff, 
> derby-590-06-aa-suppressAccessChecks.diff, 
> derby-590-07-aa-accessClassInPackage.sun.misc.diff, 
> derby-590-08-aa-omitLuceneFlag.diff, 
> derby-590-09-aa-localeSensitiveAnalysis.diff, 
> derby-590-10-aa-fixLocaleTest.diff, derby-590-11-aa-moveCode.diff, 
> derby-590-12-aa-newJar.diff, derby-590-13-aa-indexViews.diff, 
> derby-590-14-aa-coarseGrainedAuthorization.diff, 
> derby-590-15-aa-requireHardUpgrade.diff, 
> derby-590-16-aa-adjustUpgradeTest.diff, 
> derby-590-17-aa-closeInputStreamOnPropertiesFile.diff, 
> derby-590-18-aa-cleanupAPI.diff, derby-590-19-aa-cleanupAPI2.diff, 
> derby-590-20-aa-customQueryParser.diff, derby-590-21-aa-noTimeTravel.diff, 
> derby-590-22-aa-cleanupPrivacy.diff, derby-590-23-aa-correctTestLocale.diff, 
> derby-590-24-ad-luceneDirectory.diff, derby-590-26-ac-backupRestore.diff, 
> derby-590-26-ad-backupRestoreEncryption.diff, 
> derby-590-27-aa-publicAPILuceneUtils.diff, 
> derby-590-28-renameLuceneJars.diff, derby-590-29-aa-useLucene_4.7.1.diff, 
> derby-590-30-aa-nullableScoreCeiling.diff, exceptions.diff, lucene_demo.diff, 
> lucene_demo_2.diff, multifield.diff, netbeans.diff, netbeans2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (DERBY-590) How to integrate Derby with Lucene API?

Reply via email to