Osma Suominen created JENA-1134:
-----------------------------------
Summary: Support alternative QueryParsers in jena-text
Key: JENA-1134
URL: https://issues.apache.org/jira/browse/JENA-1134
Project: Apache Jena
Issue Type: Improvement
Components: Text
Reporter: Osma Suominen
Assignee: Osma Suominen
Jena-text is currently hardwired to use Lucene QueryParser. This parser is
(intentionally) limited so that it doesn't analyze wildcard queries. Instead
they will be expanded directly.
This is a problem if you want to do accent-insensitive wildcard queries (using
ASCIIFoldingFilter) or other wildcard queries which rely on a special analyzer.
However, Lucene offers an alternate parser, AnalyzingQueryParser, that could be
used in such cases.
I'd like to extend jena-text with a configuration parameter that allows using
AnalyzingQueryParser instead of the standard QueryParser. For example, the
configuration could look like this:
{noformat}
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
text:queryParser text:AnalyzingQueryParser ;
text:queryAnalyzer [
a text:ConfigurableAnalyzer ;
text:tokenizer text:KeywordTokenizer ;
text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
]
text:entityMap <#entMap> ;
{noformat}
I've written some very preliminary code to implement this, but I'm not yet
satisfied with it. It's a bit problematic because the parser cannot be
constructed in advance but must be dynamically created separately for each
query (because it needs parameters that can differ between queries).
Thus the TextIndexConfig must store information about which parser variant to
use, but not the actual QueryParser/AnalyzingQueryParser instance. This isn't
rocket science though, maybe some kind of Factory pattern would work.
For some background for why this is needed, see this Skosmos issue:
https://github.com/NatLibFi/Skosmos/issues/424
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)