Re: [Dspace-tech] Searching : Diacritics Indexing
Hi, with ASCIIFoldingFilter it not expected for that query to fail. So, probably that is some configuration problem or some wrong deployment procedure. On 9 August 2012 16:12, Claudia Jürgen claudia.juer...@ub.tu-dortmund.dewrote: Hello Emilio and all, just taken a look at the ASCIIFoldingFilter, which should cover most (those characters with reasonable ASCII alternatives are converted)of the latin characters see http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html Thought Latin Extended A would be covered, but the first test with the author name Petuškova, Jekaterina failed. Is there any definite list, which is supported in which way? Cheers Claudia Am 09.08.2012 09:14, schrieb emilio lorenzo: Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following *org.dspace.search.DSAnalyzer* configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene configuration.SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Thanks, DSpace @ Lyncode DSpace Department *Lyncode*: Official website http://www.lyncode.com/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hello, you're right, it works fine, guess I need some new glasses The author was Petuškova, Jekaterina and I searched with Petruskova Made a couple more test with characters from http://en.wikipedia.org/wiki/Latin_characters_in_Unicode and all is well. I'm still interested in an documentation about the mapping. Have a nice day Claudia Am 10.08.2012 12:09, schrieb DSpace @ Lyncode: Hi, with ASCIIFoldingFilter it not expected for that query to fail. So, probably that is some configuration problem or some wrong deployment procedure. On 9 August 2012 16:12, Claudia Jürgen claudia.juer...@ub.tu-dortmund.dewrote: Hello Emilio and all, just taken a look at the ASCIIFoldingFilter, which should cover most (those characters with reasonable ASCII alternatives are converted)of the latin characters see http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html Thought Latin Extended A would be covered, but the first test with the author name Petuškova, Jekaterina failed. Is there any definite list, which is supported in which way? Cheers Claudia Am 09.08.2012 09:14, schrieb emilio lorenzo: Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following *org.dspace.search.DSAnalyzer* configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene configuration.SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list
Re: [Dspace-tech] Searching : Diacritics Indexing
Full list of mappings http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/3.0.2/org/apache/lucene/analysis/ASCIIFoldingFilter.java#117 On 10 August 2012 12:37, Claudia Jürgen claudia.juer...@ub.tu-dortmund.dewrote: Hello, you're right, it works fine, guess I need some new glasses The author was Petuškova, Jekaterina and I searched with Petruskova Made a couple more test with characters from http://en.wikipedia.org/wiki/ **Latin_characters_in_Unicodehttp://en.wikipedia.org/wiki/Latin_characters_in_Unicodeand all is well. I'm still interested in an documentation about the mapping. Have a nice day Claudia Am 10.08.2012 12:09, schrieb DSpace @ Lyncode: Hi, with ASCIIFoldingFilter it not expected for that query to fail. So, probably that is some configuration problem or some wrong deployment procedure. On 9 August 2012 16:12, Claudia Jürgen claudia.juer...@ub.tu-** dortmund.de claudia.juer...@ub.tu-dortmund.dewrote: Hello Emilio and all, just taken a look at the ASCIIFoldingFilter, which should cover most (those characters with reasonable ASCII alternatives are converted)of the latin characters see http://lucene.apache.org/core/**old_versioned_docs/versions/2_** 9_0/api/all/org/apache/lucene/**analysis/ASCIIFoldingFilter.**htmlhttp://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html Thought Latin Extended A would be covered, but the first test with the author name Petuškova, Jekaterina failed. Is there any definite list, which is supported in which way? Cheers Claudia Am 09.08.2012 09:14, schrieb emilio lorenzo: Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following *org.dspace.search.DSAnalyzer* configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.**ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene configuration.SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\**dspace-api\src\main\java\org\** dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, --**--** -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/**sfrnl04242012/114/50122263/http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ __**_ DSpace-tech mailing list DSpace-tech@lists.sourceforge.**netDSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/**lists/listinfo/dspace-techhttps://lists.sourceforge.net/lists/listinfo/dspace-tech --**--** -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/**sfrnl04242012/114/50122263/http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ __**_ DSpace-tech mailing list DSpace-tech@lists.sourceforge.**net DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/**lists/listinfo/dspace-techhttps://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.**de/ https://eldorado.tu-dortmund.de/ --**--** -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include
Re: [Dspace-tech] Searching : Diacritics Indexing
Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following *org.dspace.search.DSAnalyzer* configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene configuration.SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hi Emilio, Thanks for your response, I added this code in DSAnalyzer.java file and rebuild Dspace. import org.apache.lucene.analysis.ASCIIFoldingFilter; result = new ASCIIFoldingFilter(result); It works now for search with accented characters, but how to remove a French stop words from indexes. Actually when to search a French stop words like. (Le, La, De, Dans), it displays all records that contain these words. It just removes the English stop words. Regards, De : emilio lorenzo [mailto:elore...@arvo.es] Envoyé : 9 août 2012 03:18 À : Hatem Jlassi; dspace-tech@lists.sourceforge.net Objet : Re: [Dspace-tech] Searching : Diacritics Indexing Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following org.dspace.search.DSAnalyzer configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, org.dspace.search.DSAnalyzer corresponds to Lucene configuration. SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hello Hatem, atm the list of stop words is defined in DSAnalyzer see protected static final String[] STOP_WORDS = { // new stopwords (per MargretB) a, am, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, the, to, was ... }; Hope this helps Claudia Jürgen Am 09.08.2012 16:09, schrieb Hatem Jlassi: Hi Emilio, Thanks for your response, I added this code in DSAnalyzer.java file and rebuild Dspace. import org.apache.lucene.analysis.ASCIIFoldingFilter; result = new ASCIIFoldingFilter(result); It works now for search with accented characters, but how to remove a French stop words from indexes. Actually when to search a French stop words like. (Le, La, De, Dans), it displays all records that contain these words. It just removes the English stop words. Regards, De : emilio lorenzo [mailto:elore...@arvo.es] Envoyé : 9 août 2012 03:18 À : Hatem Jlassi; dspace-tech@lists.sourceforge.net Objet : Re: [Dspace-tech] Searching : Diacritics Indexing Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following org.dspace.search.DSAnalyzer configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, org.dspace.search.DSAnalyzer corresponds to Lucene configuration. SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hello Emilio and all, just taken a look at the ASCIIFoldingFilter, which should cover most (those characters with reasonable ASCII alternatives are converted)of the latin characters see http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html Thought Latin Extended A would be covered, but the first test with the author name Petuškova, Jekaterina failed. Is there any definite list, which is supported in which way? Cheers Claudia Am 09.08.2012 09:14, schrieb emilio lorenzo: Hi, The class ISOLatin1AccentFilter has been deprecated by Lucene (although still can be found...) and substitued by ASCIIFoldingFilter class For english + latin languages installations , we suggest the following *org.dspace.search.DSAnalyzer* configuration (keep the order, is relevant for the searcher): import org.apache.lucene.analysis.ASCIIFoldingFilter; .. .. result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new ASCIIFoldingFilter(result); result = new PorterStemFilter(result); Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene configuration.SOLR conf is quite different. Best Luck. Emilio El 08/08/2012 20:14, Hatem Jlassi escribió: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Searching : Diacritics Indexing
Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hi, I think the problem may lie in the first line. It should be import org.apache.lucene.analysis.ISOLatin1AccentFilter; and be included at the top of the file with the rest of the imports. The second line looks fine, and goes with the rest of the filter statements. B-- On 8/8/2012 at 12:14 PM, in message 85c980bb1085994793231c3abd13a5629ee...@xmbx03.sti.usherbrooke.ca, Hatem Jlassi hatem.jla...@usherbrooke.ca wrote: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search \DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Searching : Diacritics Indexing
Hi, Thanks for your response., But, the source code is correct, following the file content. Regards, /** * The contents of this file are subject to the license and copyright * detailed in the LICENSE and NOTICE files at the root of the source * tree and available online at * * http://www.dspace.org/license/ */ package org.dspace.search; import java.io.Reader; import java.util.Set; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.ISOLatin1AccentFilter; import org.apache.lucene.analysis.LowerCaseFilter; import org.apache.lucene.analysis.PorterStemFilter; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.StopwordAnalyzerBase; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; import org.apache.lucene.util.Version; import org.dspace.core.ConfigurationManager; /** * Custom Lucene Analyzer that combines the standard filter, lowercase filter, * stemming and stopword filters. */ public class DSAnalyzer extends StopwordAnalyzerBase { protected final Version matchVersion; /* * An array containing some common words that are not usually useful for * searching. */ protected static final String[] STOP_WORDS = { // new stopwords (per MargretB) a, am, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, the, to, was // old stopwords (Lucene default) /* * a, and, are, as, at, be, but, by, for, if, in, * into, is, it, no, not, of, on, or, s, such, t, * that, the, their,then, there,these, they, this, to, * was, will, with */ }; /* * Stop table */ protected final Set stopSet; /** * Builds an analyzer * @param matchVersion Lucene version to match */ public DSAnalyzer(Version matchVersion) { super(matchVersion, StopFilter.makeStopSet(matchVersion, STOP_WORDS)); this.stopSet = StopFilter.makeStopSet(matchVersion, STOP_WORDS); this.matchVersion = matchVersion; } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { final Tokenizer source = new DSTokenizer(matchVersion, reader); TokenStream result = new StandardFilter(matchVersion, source); result = new StandardFilter(result); result = new LowerCaseFilter(matchVersion, result); result = new StopFilter(matchVersion, result, stopSet); result = new PorterStemFilter(result); result = new ISOLatin1AccentFilter(result); return new TokenStreamComponents(source, result); return result; } @Override public int getPositionIncrementGap(String fieldName) { // If it is the default field, or bounded fields is turned off in the config, return the default value if (default.equalsIgnoreCase(fieldName) || !ConfigurationManager.getBooleanProperty(search.boundedfields, false)) { return super.getPositionIncrementGap(fieldName); } // Not the default field, and we want bounded fields, so return an large gap increment return 10; } } -Message d'origine- De : Brian Freels-Stendel [mailto:bfre...@unm.edu] Envoyé : 8 août 2012 14:41 À : DSpace-tech@lists.sourceforge.net; Hatem Jlassi Objet : Re: [Dspace-tech] Searching : Diacritics Indexing Hi, I think the problem may lie in the first line. It should be import org.apache.lucene.analysis.ISOLatin1AccentFilter; and be included at the top of the file with the rest of the imports. The second line looks fine, and goes with the rest of the filter statements. B-- On 8/8/2012 at 12:14 PM, in message 85c980bb1085994793231c3abd13a5629ee...@xmbx03.sti.usherbrooke.ca, Hatem Jlassi hatem.jla...@usherbrooke.ca wrote: Hi all, We are running a bilingual (French/English) instance of last version of Dspace (1.8.2). We have some problems with the search with diacritics. The Dspace's searcher doesn't find words with accented characters when the search doesn't include these accents. We modified (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search \DSAnalyzer.java) and we added the followings two lines: ISOLatin1AccentFilter; result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace But the problem was not resolved. If anyone has solved this problem - Please Help!!! Thank You Regards, -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ DSpace