Osma Suominen created JENA-1134:
-----------------------------------

             Summary: Support alternative QueryParsers in jena-text
                 Key: JENA-1134
                 URL: https://issues.apache.org/jira/browse/JENA-1134
             Project: Apache Jena
          Issue Type: Improvement
          Components: Text
            Reporter: Osma Suominen
            Assignee: Osma Suominen


Jena-text is currently hardwired to use Lucene QueryParser. This parser is 
(intentionally) limited so that it doesn't analyze wildcard queries. Instead 
they will be expanded directly.

This is a problem if you want to do accent-insensitive wildcard queries (using 
ASCIIFoldingFilter) or other wildcard queries which rely on a special analyzer. 
However, Lucene offers an alternate parser, AnalyzingQueryParser, that could be 
used in such cases.

I'd like to extend jena-text with a configuration parameter that allows using 
AnalyzingQueryParser instead of the standard QueryParser. For example, the 
configuration could look like this:

{noformat}
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    text:queryParser text:AnalyzingQueryParser ;
    text:queryAnalyzer [
        a text:ConfigurableAnalyzer ;
        text:tokenizer text:KeywordTokenizer ;
        text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
    ] 
    text:entityMap <#entMap> ;
{noformat}

I've written some very preliminary code to implement this, but I'm not yet 
satisfied with it. It's a bit problematic because the parser cannot be 
constructed in advance but must be dynamically created separately for each 
query (because it needs parameters that can differ between queries). 

Thus the TextIndexConfig must store information about which parser variant to 
use, but not the actual QueryParser/AnalyzingQueryParser instance. This isn't 
rocket science though, maybe some kind of Factory pattern would work.

For some background for why this is needed, see this Skosmos issue:
https://github.com/NatLibFi/Skosmos/issues/424



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to