[
https://issues.apache.org/jira/browse/JENA-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16435506#comment-16435506
]
ASF GitHub Bot commented on JENA-1488:
--------------------------------------
Github user kinow commented on the issue:
https://github.com/apache/jena/pull/395
Example configuration used for testing:
```
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
[] rdf:type fuseki:Server ;
fuseki:services (
<#service_text_tdb>
) .
<#service_text_tdb> rdf:type fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "ds" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset :text_dataset ;
.
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "/tmp/db" ;
tdb:unionDefaultGraph true ; # Optional
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:/tmp/lucene> ;
text:entityMap <#entMap> ;
text:storeValues true ;
text:defineAnalyzers (
[
text:defineAnalyzer <#configuredAnalyzer> ;
text:analyzer [
a text:ConfigurableAnalyzer ;
text:tokenizer <#tokenizer> ;
text:filters ( :selectiveFoldingFilter text:LowerCaseFilter )
]
]
[
text:defineTokenizer <#tokenizer> ;
text:tokenizer [
a text:GenericTokenizer ;
text:class "org.apache.lucene.analysis.core.LowerCaseTokenizer"
]
]
[
text:defineFilter :selectiveFoldingFilter ;
text:filter [
a text:GenericFilter ;
text:class
"org.apache.jena.query.text.filter.SelectiveFoldingFilter" ;
text:params (
[
text:paramName "whitelisted" ;
text:paramType text:TypeSet ;
text:paramValue ("ç" "ä")
]
)
]
]
) ;
text:analyzer [
a text:DefinedAnalyzer ;
text:useAnalyzer <#configuredAnalyzer>
] ;
text:queryAnalyzer [
a text:DefinedAnalyzer ;
text:useAnalyzer <#configuredAnalyzer>
] ;
text:queryParser text:AnalyzingQueryParser ;
text:multilingualSupport true ;
.
<#entMap> a text:EntityMap ;
text:defaultField "pref" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
# skos:prefLabel
[ text:field "pref" ;
text:predicate skos:prefLabel
]
# skos:altLabel
[ text:field "alt" ;
text:predicate skos:altLabel
]
# skos:hiddenLabel
[ text:field "hidden" ;
text:predicate skos:hiddenLabel
]
)
.
```
> SelectiveFoldingFilter for jena-text
> ------------------------------------
>
> Key: JENA-1488
> URL: https://issues.apache.org/jira/browse/JENA-1488
> Project: Apache Jena
> Issue Type: Improvement
> Components: Text
> Affects Versions: Jena 3.6.0
> Reporter: Osma Suominen
> Assignee: Bruno P. Kinoshita
> Priority: Major
>
> Currently there's some support for accent folding in jena-text, because
> Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search
> for "deja vu" will match the literal "déjà vu" in the data.
> But we can't use it here at the National Library of Finland (for Finto.fi /
> Skosmos), because it folds too much! In the Finnish alphabet, in addition to
> the Latin a-z (which are in ASCII) we use the letters åäö and these should
> not be folded to ASCII. So we need a Lucene analyzer that can be configured
> with an exclude list, something like
>
> new SelectiveFoldingFilter(String excludeChars)
>
> and that can be also be configured via the Jena assembler just like other
> analyzers supported by jena-text.
>
> This was also briefly discussed on the skosmos-users mailing list:
> [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ]
> Apparently Norwegians have the same problem...
> I've discussed this with [~kinow] and he has some initial code to implement
> this feature, so I think we can turn this into a PR fairly soon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)