[
https://issues.apache.org/jira/browse/JENA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991709#comment-14991709
]
ASF GitHub Bot commented on JENA-1062:
--------------------------------------
Github user ajs6f commented on a diff in the pull request:
https://github.com/apache/jena/pull/97#discussion_r44015231
--- Diff:
jena-text/src/main/java/org/apache/jena/query/text/assembler/ConfigurableAnalyzerAssembler.java
---
@@ -0,0 +1,101 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.query.text.assembler;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.jena.assembler.Assembler;
+import org.apache.jena.assembler.Mode;
+import org.apache.jena.assembler.assemblers.AssemblerBase;
+import org.apache.jena.query.text.TextIndexException;
+import org.apache.jena.query.text.TextIndexLucene;
+import org.apache.jena.query.text.analyzer.ConfigurableAnalyzer;
+import org.apache.jena.rdf.model.RDFNode;
+import org.apache.jena.rdf.model.Resource;
+import org.apache.jena.rdf.model.Statement ;
+import org.apache.jena.vocabulary.RDF ;
+import org.apache.lucene.analysis.Analyzer;
+
+
+/**
+ * Assembler to create a configurable analyzer.
+ */
+public class ConfigurableAnalyzerAssembler extends AssemblerBase {
+ /*
+ text:map (
+ [ text:field "text" ;
+ text:predicate rdfs:label;
+ text:analyzer [
+ a text:ConfigurableAnalyzer ;
+ text:tokenizer text:LetterTokenizer ;
+ text:filters (text:LowerCaseFilter)
+ ]
+ ]
+ .
+ */
+
+
+ @Override
+ public Analyzer open(Assembler a, Resource root, Mode mode) {
+ if (root.hasProperty(TextVocab.pTokenizer)) {
+ Resource tokenizerResource = (Resource)
root.getProperty(TextVocab.pTokenizer).getObject();
+ String tokenizer = tokenizerResource.getLocalName();
+ List<String> filters;
+ if (root.hasProperty(TextVocab.pFilters)) {
+ Resource filtersResource = (Resource)
root.getProperty(TextVocab.pFilters).getObject();
+ filters = toFilterList(filtersResource);
+ } else {
+ filters = new ArrayList<>();
+ }
+ return new ConfigurableAnalyzer(TextIndexLucene.VER,
tokenizer, filters);
+ } else {
+ throw new TextIndexException("text:tokenizer setting is
required by ConfigurableAnalyzer");
+ }
+ }
+
+ private List<String> toFilterList(Resource list) {
+ List<String> result = new ArrayList<>();
+ Resource current = list;
+ while (current != null && ! current.equals(RDF.nil)){
+ Statement stmt = current.getProperty(RDF.first);
+ if (stmt == null) {
+ throw new TextIndexException("filter list not well
formed");
+ }
+ RDFNode node = stmt.getObject();
+ if (! node.isResource()) {
+ throw new TextIndexException("filter is not a resource : "
+ node);
+ }
+
+ Resource res = (Resource)node;
--- End diff --
Isn't `Node::asResource` a little better here and below?
> add ConfigurableAnalyzer to jena-text
> -------------------------------------
>
> Key: JENA-1062
> URL: https://issues.apache.org/jira/browse/JENA-1062
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Reporter: Osma Suominen
> Assignee: Osma Suominen
>
> This is an alternative to JENA-1058 (which implemented a very specific Lucene
> Analyzer for jena-text). The idea here, based on a comment by Claude Warren
> on JENA-1058, is to provide a ConfigurableAnalyzer that can be configured
> with a Tokenizer and (optionally) one or more TokenFilters, like this:
> text:analyzer [
> a text:ConfigurableAnalyzer ;
> text:tokenizer text:KeywordTokenizer ;
> text:filters (text:ASCIIFoldingFilter, text:LowerCaseFilter)
> ]
> I have some code ready to implement this and will open a PR shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)