[
https://issues.apache.org/jira/browse/JENA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990493#comment-14990493
]
ASF GitHub Bot commented on JENA-1062:
--------------------------------------
Github user rvesse commented on the pull request:
https://github.com/apache/jena/pull/97#issuecomment-153874598
Looks good to me
One open question, how does this interact with past work for language
specific indexing and multi-lingual indexing in general?
It's been a while since I poked around Lucene but I seem to remember that
it was often necessary to use alternative analysers particularly when you get
into languages with compound words, non-Latin alphabets, symbolic alphabets
etc. This shouldn't be a requirement for merging this work but just wanted to
check that the current design won't preclude support for this in the future?
Looking over the code it looks like it should be relatively easy to add new
analysers and filters as needed but just wanted to make sure I had understood
the code correctly
> add ConfigurableAnalyzer to jena-text
> -------------------------------------
>
> Key: JENA-1062
> URL: https://issues.apache.org/jira/browse/JENA-1062
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Reporter: Osma Suominen
> Assignee: Osma Suominen
>
> This is an alternative to JENA-1058 (which implemented a very specific Lucene
> Analyzer for jena-text). The idea here, based on a comment by Claude Warren
> on JENA-1058, is to provide a ConfigurableAnalyzer that can be configured
> with a Tokenizer and (optionally) one or more TokenFilters, like this:
> text:analyzer [
> a text:ConfigurableAnalyzer ;
> text:tokenizer text:KeywordTokenizer ;
> text:filters (text:ASCIIFoldingFilter, text:LowerCaseFilter)
> ]
> I have some code ready to implement this and will open a PR shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)