[ https://issues.apache.org/jira/browse/JENA-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16435506#comment-16435506 ]
ASF GitHub Bot commented on JENA-1488: -------------------------------------- Github user kinow commented on the issue: https://github.com/apache/jena/pull/395 Example configuration used for testing: ``` @prefix : <#> . @prefix fuseki: <http://jena.apache.org/fuseki#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text: <http://jena.apache.org/text#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . [] ja:loadClass "org.apache.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . [] ja:loadClass "org.apache.jena.query.text.TextQuery" . text:TextDataset rdfs:subClassOf ja:RDFDataset . text:TextIndexLucene rdfs:subClassOf text:TextIndex . [] rdf:type fuseki:Server ; fuseki:services ( <#service_text_tdb> ) . <#service_text_tdb> rdf:type fuseki:Service ; rdfs:label "TDB/text service" ; fuseki:name "ds" ; fuseki:serviceQuery "query" ; fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ; fuseki:serviceUpload "upload" ; fuseki:serviceReadGraphStore "get" ; fuseki:serviceReadWriteGraphStore "data" ; fuseki:dataset :text_dataset ; . :text_dataset rdf:type text:TextDataset ; text:dataset <#dataset> ; text:index <#indexLucene> ; . <#dataset> rdf:type tdb:DatasetTDB ; tdb:location "/tmp/db" ; tdb:unionDefaultGraph true ; # Optional . <#indexLucene> a text:TextIndexLucene ; text:directory <file:/tmp/lucene> ; text:entityMap <#entMap> ; text:storeValues true ; text:defineAnalyzers ( [ text:defineAnalyzer <#configuredAnalyzer> ; text:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer <#tokenizer> ; text:filters ( :selectiveFoldingFilter text:LowerCaseFilter ) ] ] [ text:defineTokenizer <#tokenizer> ; text:tokenizer [ a text:GenericTokenizer ; text:class "org.apache.lucene.analysis.core.LowerCaseTokenizer" ] ] [ text:defineFilter :selectiveFoldingFilter ; text:filter [ a text:GenericFilter ; text:class "org.apache.jena.query.text.filter.SelectiveFoldingFilter" ; text:params ( [ text:paramName "whitelisted" ; text:paramType text:TypeSet ; text:paramValue ("ç" "ä") ] ) ] ] ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer <#configuredAnalyzer> ] ; text:queryAnalyzer [ a text:DefinedAnalyzer ; text:useAnalyzer <#configuredAnalyzer> ] ; text:queryParser text:AnalyzingQueryParser ; text:multilingualSupport true ; . <#entMap> a text:EntityMap ; text:defaultField "pref" ; text:entityField "uri" ; text:uidField "uid" ; text:langField "lang" ; text:graphField "graph" ; text:map ( # skos:prefLabel [ text:field "pref" ; text:predicate skos:prefLabel ] # skos:altLabel [ text:field "alt" ; text:predicate skos:altLabel ] # skos:hiddenLabel [ text:field "hidden" ; text:predicate skos:hiddenLabel ] ) . ``` > SelectiveFoldingFilter for jena-text > ------------------------------------ > > Key: JENA-1488 > URL: https://issues.apache.org/jira/browse/JENA-1488 > Project: Apache Jena > Issue Type: Improvement > Components: Text > Affects Versions: Jena 3.6.0 > Reporter: Osma Suominen > Assignee: Bruno P. Kinoshita > Priority: Major > > Currently there's some support for accent folding in jena-text, because > Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search > for "deja vu" will match the literal "déjà vu" in the data. > But we can't use it here at the National Library of Finland (for Finto.fi / > Skosmos), because it folds too much! In the Finnish alphabet, in addition to > the Latin a-z (which are in ASCII) we use the letters åäö and these should > not be folded to ASCII. So we need a Lucene analyzer that can be configured > with an exclude list, something like > > new SelectiveFoldingFilter(String excludeChars) > > and that can be also be configured via the Jena assembler just like other > analyzers supported by jena-text. > > This was also briefly discussed on the skosmos-users mailing list: > [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ] > Apparently Norwegians have the same problem... > I've discussed this with [~kinow] and he has some initial code to implement > this feature, so I think we can turn this into a PR fairly soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005)