[
https://issues.apache.org/jira/browse/JENA-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394281#comment-16394281
]
Code Ferret commented on JENA-1488:
-----------------------------------
[Bruno P.
Kinoshita|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=kinow],
as mentioned earlier, adding {{DefinedFilter}} and {{DefinedTokenizer}} that
work seamlessly without any backwards compatibility issues with the
{{ConfigurableAnalyzer}} is quite straightforward. The
{{SelectiveFoldingFilter}} can be easily added to Jena as a _built-in_ filter
that can be configured as needed.
I'm happy to open a separate ticket on this if there is interest. I've sketched
above the essence of the assembler syntax. The implementation will use the same
framework as for {{GenericAnalyzerAssembler}} and friends, The
{{ConfigurableAnalyzer}} will be modified so that the {{getTokenizer}} and
{{getTokenizerFilter}} use a {{Hashtable}}, as in {{Utils.java}}, to retrieve
the tokenizers and filters by name.
What parameter types are need for the {{SelectiveFoldingFilter}}?
> SelectiveFoldingFilter for jena-text
> ------------------------------------
>
> Key: JENA-1488
> URL: https://issues.apache.org/jira/browse/JENA-1488
> Project: Apache Jena
> Issue Type: Improvement
> Components: Text
> Affects Versions: Jena 3.6.0
> Reporter: Osma Suominen
> Assignee: Bruno P. Kinoshita
> Priority: Major
>
> Currently there's some support for accent folding in jena-text, because
> Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search
> for "deja vu" will match the literal "déjà vu" in the data.
> But we can't use it here at the National Library of Finland (for Finto.fi /
> Skosmos), because it folds too much! In the Finnish alphabet, in addition to
> the Latin a-z (which are in ASCII) we use the letters åäö and these should
> not be folded to ASCII. So we need a Lucene analyzer that can be configured
> with an exclude list, something like
>
> new SelectiveFoldingFilter(String excludeChars)
>
> and that can be also be configured via the Jena assembler just like other
> analyzers supported by jena-text.
>
> This was also briefly discussed on the skosmos-users mailing list:
> [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ]
> Apparently Norwegians have the same problem...
> I've discussed this with [~kinow] and he has some initial code to implement
> this feature, so I think we can turn this into a PR fairly soon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)