[
https://issues.apache.org/jira/browse/JENA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991670#comment-14991670
]
ASF GitHub Bot commented on JENA-1062:
--------------------------------------
Github user ajs6f commented on a diff in the pull request:
https://github.com/apache/jena/pull/97#discussion_r44012158
--- Diff:
jena-text/src/main/java/org/apache/jena/query/text/analyzer/ConfigurableAnalyzer.java
---
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.query.text.analyzer ;
+
+import java.io.Reader ;
+import java.util.List ;
+
+import org.apache.jena.query.text.TextIndexException;
+import org.apache.lucene.analysis.Analyzer ;
+import org.apache.lucene.analysis.TokenFilter ;
+import org.apache.lucene.analysis.Tokenizer ;
+import org.apache.lucene.analysis.TokenStream ;
+import org.apache.lucene.analysis.core.KeywordTokenizer ;
+import org.apache.lucene.analysis.core.LetterTokenizer;
+import org.apache.lucene.analysis.core.LowerCaseFilter ;
+import org.apache.lucene.analysis.core.WhitespaceTokenizer ;
+import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
+import org.apache.lucene.analysis.standard.StandardFilter;
+import org.apache.lucene.analysis.standard.StandardTokenizer;
+import org.apache.lucene.util.Version ;
+
+
+/**
+ * Lucene Analyzer implementation that can be configured with different
+ * Tokenizer and (optionally) TokenFilter implementations.
+ */
+
+public class ConfigurableAnalyzer extends Analyzer {
+ private Version version;
--- End diff --
Couldn't these files be `final`?
> add ConfigurableAnalyzer to jena-text
> -------------------------------------
>
> Key: JENA-1062
> URL: https://issues.apache.org/jira/browse/JENA-1062
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Reporter: Osma Suominen
> Assignee: Osma Suominen
>
> This is an alternative to JENA-1058 (which implemented a very specific Lucene
> Analyzer for jena-text). The idea here, based on a comment by Claude Warren
> on JENA-1058, is to provide a ConfigurableAnalyzer that can be configured
> with a Tokenizer and (optionally) one or more TokenFilters, like this:
> text:analyzer [
> a text:ConfigurableAnalyzer ;
> text:tokenizer text:KeywordTokenizer ;
> text:filters (text:ASCIIFoldingFilter, text:LowerCaseFilter)
> ]
> I have some code ready to implement this and will open a PR shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)