[ 
https://issues.apache.org/jira/browse/JENA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Code Ferret updated JENA-1506:
------------------------------
    Description: 
In support of [Jena-1488|https://issues.apache.org/jira/browse/JENA-1488], this 
issue proposes to add a feature to allow including defined filters and 
tokenizers, similar to {{DefinedAnalyzer}}, for the {{ConfigurableAnalyzer}}, 
allowing configurable arguments such as the {{excludeChars}}. I've looked at 
{{ConfigurableAnalyzer}} and its assembler and it should be straightforward.

I would add tokenizer and filter definitions to {{TextIndexLucene}} similar to 
the support for adding analyzers:
{code:java}
    text:defineFilters (
        [ text:defineFilter <#foo> ; 
          text:filter [ 
            a text:GenericFilter ;
            text:class "fi.finto.FoldingFilter" ;
            text:params (
                [ text:paramName "excludeChars" ;
                  text:paramType text:TypeString ; 
                  text:paramValue "whatevercharstoexclude" ]
                )
            ] ; 
          ]
      )
{code}
{{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of 
much of the code in {{GenericAnalyzerAssembler}}. The changes to 
{{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are 
straightforward and mostly involve retaining the resource URI rather than 
extracting the localName.

Such an addition will make it easy to create new tokenizers and filters that 
could be dropped in by just adding the classes onto the jena/fuseki classpath 
or by referring to ones already included in Jena (via Lucene or otherwise) and 
putting the appropriate assembler bits in the configuration.

  was:
In support of Jena-1488, this issue proposes to add a feature to allow 
including defined filters and tokenizers, similar to {{DefinedAnalyzer}}, for 
the {{ConfigurableAnalyzer}}, allowing configurable arguments such as the 
{{excludeChars}}. I've looked at {{ConfigurableAnalyzer}} and its assembler and 
it should be straightforward.

I would add tokenizer and filter definitions to {{TextIndexLucene}} similar to 
the support for adding analyzers:
{code:java}
    text:defineFilters (
        [ text:defineFilter <#foo> ; 
          text:filter [ 
            a text:GenericFilter ;
            text:class "fi.finto.FoldingFilter" ;
            text:params (
                [ text:paramName "excludeChars" ;
                  text:paramType text:TypeString ; 
                  text:paramValue "whatevercharstoexclude" ]
                )
            ] ; 
          ]
      )
{code}
{{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of 
much of the code in {{GenericAnalyzerAssembler}}. The changes to 
{{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are 
straightforward and mostly involve retaining the resource URI rather than 
extracting the localName.

Such an addition will make it easy to create new tokenizers and filters that 
could be dropped in by just adding the classes onto the jena/fuseki classpath 
or by referring to ones already included in Jena (via Lucene or otherwise) and 
putting the appropriate assembler bits in the configuration.


> Add configurable filters and tokenizers
> ---------------------------------------
>
>                 Key: JENA-1506
>                 URL: https://issues.apache.org/jira/browse/JENA-1506
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.7.0
>            Reporter: Code Ferret
>            Priority: Major
>
> In support of [Jena-1488|https://issues.apache.org/jira/browse/JENA-1488], 
> this issue proposes to add a feature to allow including defined filters and 
> tokenizers, similar to {{DefinedAnalyzer}}, for the {{ConfigurableAnalyzer}}, 
> allowing configurable arguments such as the {{excludeChars}}. I've looked at 
> {{ConfigurableAnalyzer}} and its assembler and it should be straightforward.
> I would add tokenizer and filter definitions to {{TextIndexLucene}} similar 
> to the support for adding analyzers:
> {code:java}
>     text:defineFilters (
>         [ text:defineFilter <#foo> ; 
>           text:filter [ 
>             a text:GenericFilter ;
>             text:class "fi.finto.FoldingFilter" ;
>             text:params (
>                 [ text:paramName "excludeChars" ;
>                   text:paramType text:TypeString ; 
>                   text:paramValue "whatevercharstoexclude" ]
>                 )
>             ] ; 
>           ]
>       )
> {code}
> {{GenericFilterAssembler}} and {{GenericTokenizerAssmbler}} would make use of 
> much of the code in {{GenericAnalyzerAssembler}}. The changes to 
> {{ConfigurableAnalyzer}} and {{ConfigurableAnalyzerAssembler}} are 
> straightforward and mostly involve retaining the resource URI rather than 
> extracting the localName.
> Such an addition will make it easy to create new tokenizers and filters that 
> could be dropped in by just adding the classes onto the jena/fuseki classpath 
> or by referring to ones already included in Jena (via Lucene or otherwise) 
> and putting the appropriate assembler bits in the configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to