[
https://issues.apache.org/jira/browse/JENA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404028#comment-16404028
]
Code Ferret commented on JENA-1506:
-----------------------------------
The handling of {{text:params}} has been updated to allow deriving the type
for: {{int}}, {{boolean}}, and {{String}} parameter types. Thus for those types
the {{text:paramValue}} is sufficient as in:
{code:java}
[ text:defineTokenizer :ngram ;
text:tokenizer [
a text:GenericTokenizer ;
text:class "org.apache.lucene.analysis.ngram.NGramTokenizer" ;
text:params (
[ text:paramValue 3 ]
[ text:paramValue 7 ]
) ] ]
[ text:defineFilter :asciiff ;
text:filter [
a text:GenericFilter ;
text:class
"org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter" ;
text:params (
[ text:paramName "preserveOriginal" ;
text:paramValue true ]
) ] ]
{code}
Note that the {{text:paramName}} is optional so the parameter specification
need only have a {{text:paramValue}} in the minimal case.
> Add configurable filters and tokenizers
> ---------------------------------------
>
> Key: JENA-1506
> URL: https://issues.apache.org/jira/browse/JENA-1506
> Project: Apache Jena
> Issue Type: New Feature
> Components: Text
> Affects Versions: Jena 3.7.0
> Reporter: Code Ferret
> Priority: Major
>
> In support of [Jena-1488|https://issues.apache.org/jira/browse/JENA-1488],
> this issue proposes to add a feature to allow including defined filters and
> tokenizers, similar to {{DefinedAnalyzer}}, for the {{ConfigurableAnalyzer}},
> allowing configurable arguments such as the {{excludeChars}}. I've looked at
> {{ConfigurableAnalyzer}} and its assembler and it should be straightforward.
> I would add tokenizer and filter definitions to {{TextIndexLucene}} similar
> to the support for adding analyzers:
> {code:java}
> text:defineFilters (
> [ text:defineFilter <#foo> ;
> text:filter [
> a text:GenericFilter ;
> text:class "fi.finto.FoldingFilter" ;
> text:params (
> [ text:paramName "excludeChars" ;
> text:paramType text:TypeString ;
> text:paramValue "whatevercharstoexclude" ]
> )
> ] ;
> ]
> )
> {code}
> {{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of
> much of the code in {{GenericAnalyzerAssembler}}. The changes to
> {{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are
> straightforward and mostly involve retaining the resource URI rather than
> extracting the localName.
> Such an addition will make it easy to create new tokenizers and filters that
> could be dropped in by just adding the classes onto the jena/fuseki classpath
> or by referring to ones already included in Jena (via Lucene or otherwise)
> and putting the appropriate assembler bits in the configuration.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)