Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-10 Thread DSpace @ Lyncode
Hi,

with ASCIIFoldingFilter it not expected for that query to fail.
So, probably that is some configuration problem or some
wrong deployment procedure.

On 9 August 2012 16:12, Claudia Jürgen claudia.juer...@ub.tu-dortmund.dewrote:

 Hello Emilio and all,

 just taken a look at the ASCIIFoldingFilter, which should cover
 most (those characters with reasonable ASCII alternatives are
 converted)of the latin characters see

 http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html

 Thought Latin Extended A would be covered, but the first test with
 the author name Petuškova, Jekaterina failed.
 Is there any definite list, which is supported in which way?

 Cheers

 Claudia





 Am 09.08.2012 09:14, schrieb emilio lorenzo:
  Hi,
 
  The class ISOLatin1AccentFilter has been deprecated by Lucene (although
  still can be found...) and substitued by  ASCIIFoldingFilter class
  For english + latin languages installations , we suggest the following
  *org.dspace.search.DSAnalyzer* configuration (keep the order, is
  relevant for the searcher):
 
  import org.apache.lucene.analysis.ASCIIFoldingFilter;
  ..
  ..
  result = new StandardFilter(result);
  result = new LowerCaseFilter(result);
  result = new StopFilter(result, stopSet);
  result = new ASCIIFoldingFilter(result);
  result = new PorterStemFilter(result);
 
 
  Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene
  configuration.SOLR conf is quite different.
 
  Best Luck.
  Emilio
 
 
 
  El 08/08/2012 20:14, Hatem Jlassi escribió:
 
  Hi all,
 
  We are running a bilingual (French/English) instance of last version
  of Dspace (1.8.2). We have some problems with the search with
  diacritics. The Dspace's searcher doesn't find words with accented
  characters when the search doesn't include these accents.
 
  We modified
 
 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
  and we added the followings two lines:
 
  ISOLatin1AccentFilter;
 
  result = new ISOLatin1AccentFilter(result);
 
  Rebuild, Re-index Dspace
 
  But the problem was not resolved.
 
  If anyone has solved this problem - Please Help!!! Thank You
 
  Regards,
 
 
 
 --
 
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond.
 Discussions
  will include endpoint security, mobile security and the latest in
 malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 
 
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
 
 
 
 
 --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond. Discussions
  will include endpoint security, mobile security and the latest in malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 
 
 
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
 

 --
 Claudia Juergen
 Universitaetsbibliothek Dortmund
 Eldorado
 0231/755-4043
 https://eldorado.tu-dortmund.de/


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




-- 

Thanks, DSpace @ Lyncode
DSpace Department
*Lyncode*: Official website http://www.lyncode.com/
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-10 Thread Claudia Jürgen
Hello,

you're right, it works fine, guess I need some new glasses
The author was
Petuškova, Jekaterina
and I searched with
Petruskova
Made a couple more test with characters from 
http://en.wikipedia.org/wiki/Latin_characters_in_Unicode and all is well.

I'm still interested in an documentation about the mapping.

Have a nice day

Claudia


Am 10.08.2012 12:09, schrieb DSpace @ Lyncode:
 Hi,

 with ASCIIFoldingFilter it not expected for that query to fail.
 So, probably that is some configuration problem or some
 wrong deployment procedure.

 On 9 August 2012 16:12, Claudia Jürgen 
 claudia.juer...@ub.tu-dortmund.dewrote:

 Hello Emilio and all,

 just taken a look at the ASCIIFoldingFilter, which should cover
 most (those characters with reasonable ASCII alternatives are
 converted)of the latin characters see

 http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html

 Thought Latin Extended A would be covered, but the first test with
 the author name Petuškova, Jekaterina failed.
 Is there any definite list, which is supported in which way?

 Cheers

 Claudia





 Am 09.08.2012 09:14, schrieb emilio lorenzo:
 Hi,

 The class ISOLatin1AccentFilter has been deprecated by Lucene (although
 still can be found...) and substitued by  ASCIIFoldingFilter class
 For english + latin languages installations , we suggest the following
 *org.dspace.search.DSAnalyzer* configuration (keep the order, is
 relevant for the searcher):

 import org.apache.lucene.analysis.ASCIIFoldingFilter;
 ..
 ..
 result = new StandardFilter(result);
 result = new LowerCaseFilter(result);
 result = new StopFilter(result, stopSet);
 result = new ASCIIFoldingFilter(result);
 result = new PorterStemFilter(result);


 Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene
 configuration.SOLR conf is quite different.

 Best Luck.
 Emilio



 El 08/08/2012 20:14, Hatem Jlassi escribió:

 Hi all,

 We are running a bilingual (French/English) instance of last version
 of Dspace (1.8.2). We have some problems with the search with
 diacritics. The Dspace's searcher doesn't find words with accented
 characters when the search doesn't include these accents.

 We modified

 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
 and we added the followings two lines:

 ISOLatin1AccentFilter;

 result = new ISOLatin1AccentFilter(result);

 Rebuild, Re-index Dspace

 But the problem was not resolved.

 If anyone has solved this problem - Please Help!!! Thank You

 Regards,



 --

 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond.
 Discussions
 will include endpoint security, mobile security and the latest in
 malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


 --
 Claudia Juergen
 Universitaetsbibliothek Dortmund
 Eldorado
 0231/755-4043
 https://eldorado.tu-dortmund.de/


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech





-- 
Claudia Juergen
Universitaetsbibliothek Dortmund
Eldorado
0231/755-4043
https://eldorado.tu-dortmund.de/

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list

Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-10 Thread DSpace @ Lyncode
Full list of mappings

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/3.0.2/org/apache/lucene/analysis/ASCIIFoldingFilter.java#117

On 10 August 2012 12:37, Claudia Jürgen
claudia.juer...@ub.tu-dortmund.dewrote:

 Hello,

 you're right, it works fine, guess I need some new glasses
 The author was
 Petuškova, Jekaterina
 and I searched with
 Petruskova
 Made a couple more test with characters from http://en.wikipedia.org/wiki/
 **Latin_characters_in_Unicodehttp://en.wikipedia.org/wiki/Latin_characters_in_Unicodeand
  all is well.

 I'm still interested in an documentation about the mapping.

 Have a nice day

 Claudia


 Am 10.08.2012 12:09, schrieb DSpace @ Lyncode:

  Hi,

 with ASCIIFoldingFilter it not expected for that query to fail.
 So, probably that is some configuration problem or some
 wrong deployment procedure.

 On 9 August 2012 16:12, Claudia Jürgen claudia.juer...@ub.tu-**
 dortmund.de claudia.juer...@ub.tu-dortmund.dewrote:

  Hello Emilio and all,

 just taken a look at the ASCIIFoldingFilter, which should cover
 most (those characters with reasonable ASCII alternatives are
 converted)of the latin characters see

 http://lucene.apache.org/core/**old_versioned_docs/versions/2_**
 9_0/api/all/org/apache/lucene/**analysis/ASCIIFoldingFilter.**htmlhttp://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html

 Thought Latin Extended A would be covered, but the first test with
 the author name Petuškova, Jekaterina failed.
 Is there any definite list, which is supported in which way?

 Cheers

 Claudia





 Am 09.08.2012 09:14, schrieb emilio lorenzo:

 Hi,

 The class ISOLatin1AccentFilter has been deprecated by Lucene (although
 still can be found...) and substitued by  ASCIIFoldingFilter class
 For english + latin languages installations , we suggest the following
 *org.dspace.search.DSAnalyzer* configuration (keep the order, is
 relevant for the searcher):

 import org.apache.lucene.analysis.**ASCIIFoldingFilter;
 ..
 ..
 result = new StandardFilter(result);
 result = new LowerCaseFilter(result);
 result = new StopFilter(result, stopSet);
 result = new ASCIIFoldingFilter(result);
 result = new PorterStemFilter(result);


 Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene
 configuration.SOLR conf is quite different.

 Best Luck.
 Emilio



 El 08/08/2012 20:14, Hatem Jlassi escribió:


 Hi all,

 We are running a bilingual (French/English) instance of last version
 of Dspace (1.8.2). We have some problems with the search with
 diacritics. The Dspace's searcher doesn't find words with accented
 characters when the search doesn't include these accents.

 We modified

  (\dspace-1.8.2-src-release\**dspace-api\src\main\java\org\**
 dspace\search\DSAnalyzer.java)

 and we added the followings two lines:

 ISOLatin1AccentFilter;

 result = new ISOLatin1AccentFilter(result);

 Rebuild, Re-index Dspace

 But the problem was not resolved.

 If anyone has solved this problem - Please Help!!! Thank You

 Regards,



  --**--**
 --


 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond.

 Discussions

 will include endpoint security, mobile security and the latest in

 malware

 threats. 
 http://www.accelacomm.com/jaw/**sfrnl04242012/114/50122263/http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


 __**_
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.**netDSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/**lists/listinfo/dspace-techhttps://lists.sourceforge.net/lists/listinfo/dspace-tech





  --**--**
 --

 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond.
 Discussions
 will include endpoint security, mobile security and the latest in
 malware
 threats. 
 http://www.accelacomm.com/jaw/**sfrnl04242012/114/50122263/http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/



 __**_
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.**net DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/**lists/listinfo/dspace-techhttps://lists.sourceforge.net/lists/listinfo/dspace-tech


 --
 Claudia Juergen
 Universitaetsbibliothek Dortmund
 Eldorado
 0231/755-4043
 https://eldorado.tu-dortmund.**de/ https://eldorado.tu-dortmund.de/


 --**--**
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include 

Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-09 Thread emilio lorenzo

Hi,

The class ISOLatin1AccentFilter has been deprecated by Lucene (although 
still can be found...) and substitued by  ASCIIFoldingFilter class
For english + latin languages installations , we suggest the following 
*org.dspace.search.DSAnalyzer* configuration (keep the order, is 
relevant for the searcher):


import org.apache.lucene.analysis.ASCIIFoldingFilter;
..
..
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new ASCIIFoldingFilter(result);
result = new PorterStemFilter(result);


Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene 
configuration.SOLR conf is quite different.


Best Luck.
Emilio



El 08/08/2012 20:14, Hatem Jlassi escribió:


Hi all,

We are running a bilingual (French/English) instance of last version 
of Dspace (1.8.2). We have some problems with the search with 
diacritics. The Dspace's searcher doesn't find words with accented 
characters when the search doesn't include these accents.


We modified 
(\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java) 
and we added the followings two lines:


ISOLatin1AccentFilter;

result = new ISOLatin1AccentFilter(result);

Rebuild, Re-index Dspace

But the problem was not resolved.

If anyone has solved this problem - Please Help!!! Thank You

Regards,


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-09 Thread Hatem Jlassi
Hi Emilio,

Thanks for your response, I added this code in DSAnalyzer.java file and rebuild 
Dspace.
import org.apache.lucene.analysis.ASCIIFoldingFilter;
result = new ASCIIFoldingFilter(result);

It works now for search with accented characters, but how to remove a French 
stop words from indexes. Actually when to search a French stop words like. (Le, 
La, De, Dans), it displays all records that contain these words. It just 
removes the English stop words.

Regards,


De : emilio lorenzo [mailto:elore...@arvo.es]
Envoyé : 9 août 2012 03:18
À : Hatem Jlassi; dspace-tech@lists.sourceforge.net
Objet : Re: [Dspace-tech] Searching : Diacritics  Indexing

Hi,

The class ISOLatin1AccentFilter has been deprecated by Lucene (although still 
can be found...) and substitued by  ASCIIFoldingFilter class
For english + latin languages installations , we suggest the following  
org.dspace.search.DSAnalyzer configuration (keep the order, is relevant for the 
searcher):

import org.apache.lucene.analysis.ASCIIFoldingFilter;
..
..
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new ASCIIFoldingFilter(result);
result = new PorterStemFilter(result);

Anyway, org.dspace.search.DSAnalyzer corresponds to Lucene configuration.
SOLR conf is quite different.
Best Luck.
Emilio


El 08/08/2012 20:14, Hatem Jlassi escribió:
Hi all,

We are running a bilingual (French/English) instance of last version of Dspace 
(1.8.2). We have some problems with the search with diacritics. The Dspace's 
searcher doesn't find words with accented characters when the search doesn't 
include these accents.
We modified 
(\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
 and we added the followings two lines:
ISOLatin1AccentFilter;
result = new ISOLatin1AccentFilter(result);
Rebuild, Re-index Dspace
But the problem was not resolved.

If anyone has solved this problem - Please Help!!! Thank You

Regards,






--

Live Security Virtual Conference

Exclusive live event will cover all the ways today's security and

threat landscape has changed and how IT managers can respond. Discussions

will include endpoint security, mobile security and the latest in malware

threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/





___

DSpace-tech mailing list

DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/dspace-tech
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-09 Thread Claudia Jürgen
Hello Hatem,

atm the list of stop words is defined in DSAnalyzer
see
 protected static final String[] STOP_WORDS =
 {

 // new stopwords (per MargretB)
 a, am, and, are, as, at, be, but, by, for,
 if, in, into, is, it, no, not, of, on, or,
 the, to, was
...
 };


Hope this helps

Claudia Jürgen


Am 09.08.2012 16:09, schrieb Hatem Jlassi:
 Hi Emilio,

 Thanks for your response, I added this code in DSAnalyzer.java file and 
 rebuild Dspace.
 import org.apache.lucene.analysis.ASCIIFoldingFilter;
 result = new ASCIIFoldingFilter(result);

 It works now for search with accented characters, but how to remove a French 
 stop words from indexes. Actually when to search a French stop words like. 
 (Le, La, De, Dans), it displays all records that contain these words. It just 
 removes the English stop words.

 Regards,


 De : emilio lorenzo [mailto:elore...@arvo.es]
 Envoyé : 9 août 2012 03:18
 À : Hatem Jlassi; dspace-tech@lists.sourceforge.net
 Objet : Re: [Dspace-tech] Searching : Diacritics  Indexing

 Hi,

 The class ISOLatin1AccentFilter has been deprecated by Lucene (although still 
 can be found...) and substitued by  ASCIIFoldingFilter class
 For english + latin languages installations , we suggest the following  
 org.dspace.search.DSAnalyzer configuration (keep the order, is relevant for 
 the searcher):

 import org.apache.lucene.analysis.ASCIIFoldingFilter;
 ..
 ..
 result = new StandardFilter(result);
 result = new LowerCaseFilter(result);
 result = new StopFilter(result, stopSet);
 result = new ASCIIFoldingFilter(result);
 result = new PorterStemFilter(result);

 Anyway, org.dspace.search.DSAnalyzer corresponds to Lucene configuration.
 SOLR conf is quite different.
 Best Luck.
 Emilio


 El 08/08/2012 20:14, Hatem Jlassi escribió:
 Hi all,

 We are running a bilingual (French/English) instance of last version of 
 Dspace (1.8.2). We have some problems with the search with diacritics. The 
 Dspace's searcher doesn't find words with accented characters when the search 
 doesn't include these accents.
 We modified 
 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
  and we added the followings two lines:
 ISOLatin1AccentFilter;
 result = new ISOLatin1AccentFilter(result);
 Rebuild, Re-index Dspace
 But the problem was not resolved.

 If anyone has solved this problem - Please Help!!! Thank You

 Regards,






 --

 Live Security Virtual Conference

 Exclusive live event will cover all the ways today's security and

 threat landscape has changed and how IT managers can respond. Discussions

 will include endpoint security, mobile security and the latest in malware

 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/





 ___

 DSpace-tech mailing list

 DSpace-tech@lists.sourceforge.netmailto:DSpace-tech@lists.sourceforge.net

 https://lists.sourceforge.net/lists/listinfo/dspace-tech



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-- 
Claudia Juergen
Universitaetsbibliothek Dortmund
Eldorado
0231/755-4043
https://eldorado.tu-dortmund.de/

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-09 Thread Claudia Jürgen
Hello Emilio and all,

just taken a look at the ASCIIFoldingFilter, which should cover
most (those characters with reasonable ASCII alternatives are 
converted)of the latin characters see
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html

Thought Latin Extended A would be covered, but the first test with
the author name Petuškova, Jekaterina failed.
Is there any definite list, which is supported in which way?

Cheers

Claudia





Am 09.08.2012 09:14, schrieb emilio lorenzo:
 Hi,

 The class ISOLatin1AccentFilter has been deprecated by Lucene (although
 still can be found...) and substitued by  ASCIIFoldingFilter class
 For english + latin languages installations , we suggest the following
 *org.dspace.search.DSAnalyzer* configuration (keep the order, is
 relevant for the searcher):

 import org.apache.lucene.analysis.ASCIIFoldingFilter;
 ..
 ..
 result = new StandardFilter(result);
 result = new LowerCaseFilter(result);
 result = new StopFilter(result, stopSet);
 result = new ASCIIFoldingFilter(result);
 result = new PorterStemFilter(result);


 Anyway, *org.dspace.search.DSAnalyzer* corresponds to Lucene
 configuration.SOLR conf is quite different.

 Best Luck.
 Emilio



 El 08/08/2012 20:14, Hatem Jlassi escribió:

 Hi all,

 We are running a bilingual (French/English) instance of last version
 of Dspace (1.8.2). We have some problems with the search with
 diacritics. The Dspace's searcher doesn't find words with accented
 characters when the search doesn't include these accents.

 We modified
 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
 and we added the followings two lines:

 ISOLatin1AccentFilter;

 result = new ISOLatin1AccentFilter(result);

 Rebuild, Re-index Dspace

 But the problem was not resolved.

 If anyone has solved this problem - Please Help!!! Thank You

 Regards,


 --

 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-- 
Claudia Juergen
Universitaetsbibliothek Dortmund
Eldorado
0231/755-4043
https://eldorado.tu-dortmund.de/

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Searching : Diacritics Indexing

2012-08-08 Thread Hatem Jlassi
Hi all,

We are running a bilingual (French/English) instance of last version of Dspace 
(1.8.2). We have some problems with the search with diacritics. The Dspace's 
searcher doesn't find words with accented characters when the search doesn't 
include these accents.
We modified 
(\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search\DSAnalyzer.java)
 and we added the followings two lines:
ISOLatin1AccentFilter;
result = new ISOLatin1AccentFilter(result);
Rebuild, Re-index Dspace
But the problem was not resolved.

If anyone has solved this problem - Please Help!!! Thank You

Regards,

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-08 Thread Brian Freels-Stendel
Hi,

I think the problem may lie in the first line.  It should be

import org.apache.lucene.analysis.ISOLatin1AccentFilter;

and be included at the top of the file with the rest of the imports.  The 
second line looks fine, and goes with the rest of the filter statements.

B--

 On 8/8/2012 at 12:14 PM, in message
85c980bb1085994793231c3abd13a5629ee...@xmbx03.sti.usherbrooke.ca, Hatem
Jlassi hatem.jla...@usherbrooke.ca wrote:
 Hi all,
 
 We are running a bilingual (French/English) instance of last version of 
 Dspace (1.8.2). We have some problems with the search with diacritics. The 
 Dspace's searcher doesn't find words with accented characters when the search 
 doesn't include these accents.
 We modified 
 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search
 \DSAnalyzer.java) and we added the followings two lines:
 ISOLatin1AccentFilter;
 result = new ISOLatin1AccentFilter(result);
 Rebuild, Re-index Dspace
 But the problem was not resolved.
 
 If anyone has solved this problem - Please Help!!! Thank You
 
 Regards,


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching : Diacritics Indexing

2012-08-08 Thread Hatem Jlassi
Hi,

Thanks for your response.,
But, the source code is correct, following the file content.

Regards,


/**
 * The contents of this file are subject to the license and copyright
 * detailed in the LICENSE and NOTICE files at the root of the source
 * tree and available online at
 *
 * http://www.dspace.org/license/
 */
package org.dspace.search;

import java.io.Reader;
import java.util.Set;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.PorterStemFilter;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.StopwordAnalyzerBase;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.util.Version;
import org.dspace.core.ConfigurationManager;

/**
 * Custom Lucene Analyzer that combines the standard filter, lowercase filter,
 * stemming and stopword filters.
 */
public class DSAnalyzer extends StopwordAnalyzerBase
{
protected final Version matchVersion;
/*
 * An array containing some common words that are not usually useful for
 * searching.
 */
protected static final String[] STOP_WORDS =
{

// new stopwords (per MargretB)
a, am, and, are, as, at, be, but, by, for,
if, in, into, is, it, no, not, of, on, or,
the, to, was
// old stopwords (Lucene default)
/*
 * a, and, are, as, at, be, but, by, for, if, in,
 * into, is, it, no, not, of, on, or, s, such, t,
 * that, the, their,then, there,these, they, this, to,
 * was, will, with
 */
};

/*
 * Stop table
 */
protected final Set stopSet;

/**
 * Builds an analyzer
 * @param matchVersion Lucene version to match
 */
public DSAnalyzer(Version matchVersion) {
super(matchVersion, StopFilter.makeStopSet(matchVersion, STOP_WORDS));
this.stopSet = StopFilter.makeStopSet(matchVersion, STOP_WORDS);
this.matchVersion = matchVersion;
}

@Override
protected TokenStreamComponents createComponents(String fieldName, Reader 
reader) {
final Tokenizer source = new DSTokenizer(matchVersion, reader);
TokenStream result = new StandardFilter(matchVersion, source);
result = new StandardFilter(result);
result = new LowerCaseFilter(matchVersion, result);
result = new StopFilter(matchVersion, result, stopSet);
result = new PorterStemFilter(result);
result = new ISOLatin1AccentFilter(result);
return new TokenStreamComponents(source, result);
return result;
}

@Override
public int getPositionIncrementGap(String fieldName)
{
// If it is the default field, or bounded fields is turned off in the 
config, return the default value
if (default.equalsIgnoreCase(fieldName) || 
!ConfigurationManager.getBooleanProperty(search.boundedfields, false))
{
return super.getPositionIncrementGap(fieldName);
}

// Not the default field, and we want bounded fields, so return an 
large gap increment
return 10;
}
}



-Message d'origine-
De : Brian Freels-Stendel [mailto:bfre...@unm.edu] 
Envoyé : 8 août 2012 14:41
À : DSpace-tech@lists.sourceforge.net; Hatem Jlassi
Objet : Re: [Dspace-tech] Searching : Diacritics  Indexing

Hi,

I think the problem may lie in the first line.  It should be

import org.apache.lucene.analysis.ISOLatin1AccentFilter;

and be included at the top of the file with the rest of the imports.  The 
second line looks fine, and goes with the rest of the filter statements.

B--

 On 8/8/2012 at 12:14 PM, in message
85c980bb1085994793231c3abd13a5629ee...@xmbx03.sti.usherbrooke.ca, Hatem 
Jlassi hatem.jla...@usherbrooke.ca wrote:
 Hi all,
 
 We are running a bilingual (French/English) instance of last version 
 of Dspace (1.8.2). We have some problems with the search with 
 diacritics. The Dspace's searcher doesn't find words with accented 
 characters when the search doesn't include these accents.
 We modified
 (\dspace-1.8.2-src-release\dspace-api\src\main\java\org\dspace\search
 \DSAnalyzer.java) and we added the followings two lines:
 ISOLatin1AccentFilter;
 result = new ISOLatin1AccentFilter(result); Rebuild, Re-index Dspace 
 But the problem was not resolved.
 
 If anyone has solved this problem - Please Help!!! Thank You
 
 Regards,


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
DSpace