[
https://issues.apache.org/jira/browse/JENA-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217985#comment-15217985
]
ASF GitHub Bot commented on JENA-1134:
--------------------------------------
Github user osma commented on a diff in the pull request:
https://github.com/apache/jena/pull/131#discussion_r57891059
--- Diff:
jena-text/src/test/java/org/apache/jena/query/text/TestDatasetWithAnalyzingQueryParser.java
---
@@ -0,0 +1,64 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.query.text;
+
+import java.util.Set ;
+
+import org.apache.jena.atlas.lib.StrUtils ;
+import org.apache.jena.ext.com.google.common.collect.Sets ;
+import org.junit.Before ;
+import org.junit.Test ;
+
+/**
+ * This class defines a setup configuration for a dataset that uses an
ASCII folding lowercase keyword analyzer with a Lucene index.
+ */
+public class TestDatasetWithAnalyzingQueryParser extends
TestDatasetWithConfigurableAnalyzer {
+ @Override
+ @Before
+ public void before() {
+ init(StrUtils.strjoinNL(
+ "text:ConfigurableAnalyzer ;",
+ "text:tokenizer text:KeywordTokenizer ;",
+ "text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)"
+ ), "text:AnalyzingQueryParser");
+ }
+
+ @Test
+ public void testAnalyzingQueryParserAnalyzesWildcards() {
+ final String testName =
"testAnalyzingQueryParserAnalyzesWildcards";
+ final String turtle = StrUtils.strjoinNL(
+ TURTLE_PROLOG,
+ "<" + RESOURCE_BASE + testName + ">",
+ " rdfs:label 'éducation'@fr",
+ ".",
+ "<" + RESOURCE_BASE + "irrelevant>",
+ " rdfs:label 'déjà vu'@fr",
+ "."
+ );
+ String queryString = StrUtils.strjoinNL(
+ QUERY_PROLOG,
+ "SELECT ?s",
+ "WHERE {",
+ " ?s text:query ( rdfs:label 'édu*' 10 ) .",
+ "}"
+ );
+ Set<String> expectedURIs = Sets.newHashSet(RESOURCE_BASE +
testName);
--- End diff --
This was copied from another, similar unit test
(TestDatasetWithConfigurableAnalyzer.java). I could change both, of course, but
I doubt this makes a big difference for readability.
> Support alternative QueryParsers in jena-text
> ---------------------------------------------
>
> Key: JENA-1134
> URL: https://issues.apache.org/jira/browse/JENA-1134
> Project: Apache Jena
> Issue Type: Improvement
> Components: Text
> Affects Versions: Jena 3.0.1
> Reporter: Osma Suominen
> Assignee: Osma Suominen
>
> Jena-text is currently hardwired to use Lucene QueryParser. This parser is
> (intentionally) limited so that it doesn't analyze wildcard queries. Instead
> they will be expanded directly.
> This is a problem if you want to do accent-insensitive wildcard queries
> (using ASCIIFoldingFilter) or other wildcard queries which rely on a special
> analyzer. However, Lucene offers an alternate parser, AnalyzingQueryParser,
> that could be used in such cases.
> I'd like to extend jena-text with a configuration parameter that allows using
> AnalyzingQueryParser instead of the standard QueryParser. For example, the
> configuration could look like this:
> {noformat}
> <#indexLucene> a text:TextIndexLucene ;
> text:directory <file:Lucene> ;
> text:queryParser text:AnalyzingQueryParser ;
> text:queryAnalyzer [
> a text:ConfigurableAnalyzer ;
> text:tokenizer text:KeywordTokenizer ;
> text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
> ]
> text:entityMap <#entMap> ;
> {noformat}
> I've written some very preliminary code to implement this, but I'm not yet
> satisfied with it. It's a bit problematic because the parser cannot be
> constructed in advance but must be dynamically created separately for each
> query (because it needs parameters that can differ between queries).
> Thus the TextIndexConfig must store information about which parser variant to
> use, but not the actual QueryParser/AnalyzingQueryParser instance. This isn't
> rocket science though, maybe some kind of Factory pattern would work.
> For some background for why this is needed, see this Skosmos issue:
> https://github.com/NatLibFi/Skosmos/issues/424
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)