[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Knut Anders Hatlen (JIRA) Thu, 05 Jun 2014 03:36:20 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018659#comment-14018659
 ]


Knut Anders Hatlen commented on DERBY-590:
------------------------------------------

I suppose you could simulate the functionality that way. You'd probably need a 
custom query parser as well, in that case, in order to make the query language 
understand that "method:compute" is a single token. In the default Lucene query 
parser, that would be interpreted as a search for the token "compute" in the 
field "method".

By the way, when I said "multiple indexes" and "multiple analyzers" above, I 
think I meant what in Lucene speak should have been "multiple fields". I think 
it's still called a single index in Lucene speak, even if you index separately 
on multiple fields/keys.

Currently, when the luceneSupport tool creates an index, it makes every string 
value a Document with a single field called "luceneTextField".

{code:title=LuceneSupport.java#createOrRecreateIndex}
                String  textcolValue = rs.getString( keyCount + 1 );
                if ( textcolValue != null )
                {
                    doc.add(new TextField( LuceneQueryVTI.TEXT_FIELD_NAME, 
textcolValue, Store.NO));
                }
                addDocument( iw, doc );
{code}

The flexibility I was looking for, was the ability to have more fields than the 
single, hard-coded one. For example, by having an extra argument to CREATEINDEX 
(and UPDATEINDEX) which is a comma-separated list of field names (with a 
reasonable default when NULL), and make the above code add each of the fields.

In my hypothetical Java code in a CLOB example, that would mean something like 
this for creating the index:

{code:sql}
CALL LUCENESUPPORT.CREATEINDEX('app', 'sourcefiles', 'sourcetext', 
'MyAnalyzer.create', 'comment,method', 'pk')
{code}

The custom analyzer would be something like this:

{code}
public class MyAnalyzer extends Analyzer {

    public static Analyzer create() {
        return new MyAnalyzer();
    }

    @Override
    protected TokenStreamComponents createComponents(String field, Reader r) {
        switch (field) {
            case "comment":
                return new TokenStreamComponents(createCommentTokenizer(r));
            case "method":
                return new TokenStreamComponents(createMethodTokenizer(r));
            default:
                throw new AssertionError("unknown field name: " + field);
        }
    }

    private static Tokenizer createCommentTokenizer(Reader r) {
        // TODO: Create a tokenizer that extracts tokens only from
        // code comments.
        // ....
    }

    private static Tokenizer createMethodTokenizer(Reader r) {
        // TODO: Create a tokenizer that only returns method names.
        // ....
    }

}
{code}

Might not add any functionality that you couldn't work around somehow with the 
current implementation. But I think that the extra flexibility would allow the 
application to push more of the full-text search logic down to Lucene, where it 
belongs. At least you'd avoid the need for a custom query parser and creation 
of synthetic tokens.

> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: LucenePlugin.html, LucenePlugin.html, LucenePlugin.html, 
> derby-590-01-ag-publicAccessToLuceneRoutines.diff, 
> derby-590-01-ah-publicAccessToLuceneRoutines.diff, 
> derby-590-01-am-publicAccessToLuceneRoutines.diff, 
> derby-590-02-aa-cleanupFindbugsErrors.diff, 
> derby-590-03-aa-removeTestingDiagnostic.diff, 
> derby-590-04-aa-removeIDFromListIndexes.diff, 
> derby-590-05-aa-accessDeclaredMembers.diff, 
> derby-590-06-aa-suppressAccessChecks.diff, 
> derby-590-07-aa-accessClassInPackage.sun.misc.diff, 
> derby-590-08-aa-omitLuceneFlag.diff, 
> derby-590-09-aa-localeSensitiveAnalysis.diff, 
> derby-590-10-aa-fixLocaleTest.diff, derby-590-11-aa-moveCode.diff, 
> derby-590-12-aa-newJar.diff, derby-590-13-aa-indexViews.diff, 
> derby-590-14-aa-coarseGrainedAuthorization.diff, 
> derby-590-15-aa-requireHardUpgrade.diff, 
> derby-590-16-aa-adjustUpgradeTest.diff, 
> derby-590-17-aa-closeInputStreamOnPropertiesFile.diff, 
> derby-590-18-aa-cleanupAPI.diff, derby-590-19-aa-cleanupAPI2.diff, 
> derby-590-20-aa-customQueryParser.diff, derby-590-21-aa-noTimeTravel.diff, 
> derby-590-22-aa-cleanupPrivacy.diff, derby-590-23-aa-correctTestLocale.diff, 
> derby-590-24-ad-luceneDirectory.diff, derby-590-26-ac-backupRestore.diff, 
> derby-590-26-ad-backupRestoreEncryption.diff, 
> derby-590-27-aa-publicAPILuceneUtils.diff, 
> derby-590-28-renameLuceneJars.diff, derby-590-29-aa-useLucene_4.7.1.diff, 
> derby-590-30-aa-nullableScoreCeiling.diff, exceptions.diff, lucene_demo.diff, 
> lucene_demo_2.diff, netbeans.diff, netbeans2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?

Reply via email to