ASF GitHub Bot commented on JENA-1488:

GitHub user kinow opened a pull request:


    JENA-1488: add a selective folding analyzer

    This PR adds a selective folding analyzer, as explained in JENA-1488.
    It takes a list of characters, used as a white list. Everything that is not 
in the white list, gets oassed though the existing ASCIIFoldingFilter.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kinow/jena selective-folding-analyzer

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #395
commit de1bd22a58f76bbac41d16cb7111ed85b98279cd
Author: Bruno P. Kinoshita <kinow@...>
Date:   2018-04-09T09:38:14Z

    JENA-1488: add a selective folding analyzer


> SelectiveFoldingFilter for jena-text
> ------------------------------------
>                 Key: JENA-1488
>                 URL: https://issues.apache.org/jira/browse/JENA-1488
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Text
>    Affects Versions: Jena 3.6.0
>            Reporter: Osma Suominen
>            Assignee: Bruno P. Kinoshita
>            Priority: Major
> Currently there's some support for accent folding in jena-text, because 
> Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search 
> for "deja vu" will match the literal "déjà vu" in the data.
> But we can't use it here at the National Library of Finland (for Finto.fi / 
> Skosmos), because it folds too much! In the Finnish alphabet, in addition to 
> the Latin a-z (which are in ASCII) we use the letters åäö and these should 
> not be folded to ASCII. So we need a Lucene analyzer that can be configured 
> with an exclude list, something like 
> new SelectiveFoldingFilter(String excludeChars) 
> and that can be also be configured via the Jena assembler just like other 
> analyzers supported by jena-text. 
> This was also briefly discussed on the skosmos-users mailing list: 
> [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ] 
> Apparently Norwegians have the same problem...
> I've discussed this with [~kinow] and he has some initial code to implement 
> this feature, so I think we can turn this into a PR fairly soon.

This message was sent by Atlassian JIRA

Reply via email to