hi thanks for your reply. Please suggest me what to do now. i want to index the document which contains multiple languages.
I really waiting for this to complete with your help. Please,please help me Erick Erickson wrote: > > I'm puzzled why you say > > "By the above out put we can say that StandardAnalyzer is > enough to get rid of danish elements." > > It does NOT get rid of the accents, according to your own output. > If your goal is to go ahead and index multiple language documents > in a single index then search it, I'd recommend using the > ISOLatin1AccentFilter. Compare the output of an analyzer like: > > public class MyAnalyzer extends StandardAnalyzer { > public TokenStream tokenStream(String s, Reader reader) { > return new ISOLatin1AccentFilter(super.tokenStream(s, reader)); > } > } > and you'll see that all the accents disappear, which will play much > more nicely with queries that don't have accents. > > Best > Erick > > On Fri, Apr 24, 2009 at 3:53 AM, uday kumar maddigatla > <u...@mach.com>wrote: > >> >> Hi Thanks for your reply. >> >> After gone threw with the site which you given... i understood that >> StandardAnalyzer is enough to handle these special characters. >> >> i'm attaching one class called AnalysisDemo.java. By executing that class >> i'm able to say the above sentance(i.e StandardAnalyzer is enough). >> >> Here is the out put when i ran the above java file. >> Analzying "Vedr. : Amtsgården Århus, Lyseng Allé 1, 8270 >> Højbjerg" >> org.apache.lucene.analysis.WhitespaceAnalyzer: >> [Vedr.] [:] [Amtsgården] [Århus,] [Lyseng] [Allé] [1,] >> [8270] [Højbjerg] >> >> org.apache.lucene.analysis.SimpleAnalyzer: >> [vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg] >> >> org.apache.lucene.analysis.StopAnalyzer: >> [vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg] >> >> org.apache.lucene.analysis.standard.StandardAnalyzer: >> [vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] >> [højbjerg] >> >> org.apache.lucene.analysis.snowball.SnowballAnalyzer: >> [vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] >> [højbjerg] >> >> By the above out put we can say that StandardAnalyzer is enough to get >> rid >> of danish elements. >> >> But only problem is when i'm searching the any term which includes the >> danish elements(like højbjerg...) >> >> it is unable to find out. >> >> Even i checked with LUKE. In that i given my sample text which contains >> the >> danish elements and selected the StandardAnalyzer as analyser. when i >> click >> analyze in that it cleary making index of danish words. >> >> and also i givne one try on luke by loading my index directory in to >> luke. >> after loading my index i searched for a word which contains the danish >> element, But this time it was failed. It was shown nothing(i.e o >> resluts). >> >> As in my sense the problem might be making the indexes or in searching >> the >> item. >> >> >> I gone threw the site which you given. From that i'm able to do this kind >> of >> reaserch work. >> >> Please help me in this. >> >> >> Erick Erickson wrote: >> > >> > OK, this is a much different problem than you were originally >> > asking about, effectively "how to index/search mixed language >> > documents". >> > >> > This topic has been discussed multiple times on the user list, I >> > think your first step should be to search the archive. I *was* >> > going to find the old searchable mail archive, but those clever folks >> > at Lucid Imagination have something new, see: >> > >> > http://www.lucidimagination.com/search/p:lucene?q=multiple+languages >> > >> > Once you've had a chance to look that over I think you'll be off and >> > running. >> > >> > Best >> > Erick >> > >> > On Thu, Apr 23, 2009 at 1:43 AM, uday kumar maddigatla >> > <u...@mach.com>wrote: >> > >> >> >> >> HI >> >> >> >> Here are the details about my goals. >> >> 1. I want to use this lucene for mixed languages. >> >> 2. I want to make indexes of the documents which are either english or >> >> danish etc. >> >> I'm attaching my IndexFiles.java file. >> >> >> >> When i'm searching i'm giving the index path location as well as >> >> doucmets >> >> folder. >> >> >> >> If i use StandardAnalyzer as an argument to IndexWriter's method it is >> >> able >> >> to search the english characters. >> >> >> >> How can i use DutchAnalyzer in order to make this IndexFiles.java to >> >> index >> >> the danish elements. >> >> >> >> In my Code which i attached, you can see 'C:\test3'. This is my >> location >> >> where i want to store my indexes. >> >> >> >> I'm giving documents folder location as comand line argument. >> >> >> >> In my document the content will be like this >> >> >> >> <com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga >> skal >> >> opsplittes >> >> hhv. byggeplads og skat >> >> Vedr. : Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg >> >> Bygning B >> >> SES Journal nr. : 42895-0001 >> >> SES Navision nr.: Navision 9800124 >> >> SES Ansvarlig : Martin Krøldrup Nielsen >> >> SES rådgiver : Friis & Moltke A/S >> >> Hermed fremsendes faktura på ekstra tømrerarbejde. >> >> Byggeplads Amtsgården B-4 >> >> jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note> >> >> >> >> i"m searching the word like rådgiver . When i see the result it is >> >> clearly >> >> searching for r dgiver. It is omitting the danish element. >> >> >> >> Please help me in this. >> >> >> >> >> >> >> >> Erick Erickson wrote: >> >> > >> >> > Are you *also* using the DutchAnalyzer for your *query*? >> >> > >> >> > Please show us the index and search code (simplified as much >> >> > as possible), then we'll be able to provide better suggestions. >> >> > >> >> > Also, tell us a bit more about your goals here. Is this an >> >> > index entirely of Dutch documents? Or is it a mixed-language >> >> > index? >> >> > >> >> > Think about getting a copy of Luke and >> >> > 1> examining your index to see what's *really* there >> >> > 2> examining the effects of using different parsers on >> >> > your *query*. >> >> > >> >> > Best >> >> > Erick >> >> > >> >> > On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla >> >> > <u...@mach.com>wrote: >> >> > >> >> >> >> >> >> Hi >> >> >> >> >> >> Thanks for your reply. >> >> >> >> >> >> I'm able to see the DutchAnalyzer. >> >> >> >> >> >> When i'm indexing my documents i given instace of DutchAnalyzer as >> an >> >> >> argument to IndexWriter Class. >> >> >> >> >> >> After this when i search for the >> >> >> http://www.nabble.com/file/p23170710/IndexFiles.java >> IndexFiles.java >> >> >> contains the danish elements .. Still it is not able to identify. >> >> >> >> >> >> Please tell me how to use DutchAnalzer in my application. Sample >> >> example >> >> >> or >> >> >> series of steps helps me. >> >> >> >> >> >> I also attached my index file(.java file). >> >> >> >> >> >> Please help me in this. please.. >> >> >> >> >> >> Erick Erickson wrote: >> >> >> > >> >> >> > Take a look at DutchAnalyzer. The problem you'll have is if >> you're >> >> >> > indexing >> >> >> > this document along with a bunch of documents from other >> languages. >> >> >> > You could search the mail archive for extensive discussions of >> >> >> indexing/ >> >> >> > searching documents from several languages. >> >> >> > >> >> >> > Best >> >> >> > Erick >> >> >> > >> >> >> > On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla >> >> >> > <u...@mach.com>wrote: >> >> >> > >> >> >> >> HI, >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I'm new to the lucene. I downloaded lucene 2.4.1. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I have one xml file which contains few special characters like >> 'å', >> >> >> 'ø,' >> >> >> >> °' >> >> >> >> etc.(these are Danish language elements). >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> How can I search these things. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Uday Kumar Reddy Maddigatla >> >> >> >> >> >> >> >> Software Engineer(Progrator|gatetrade) >> >> >> >> >> >> >> >> MACH India(Operations) >> >> >> >> >> >> >> >> Mobile: + 91-9963000377 >> >> >> >> >> >> >> >> uday.maddiga...@ness.com <mailto:uday.maddiga...@ness.com> >> >> >> >> >> >> >> >> u...@mach.com <mailto:u...@mach.com> >> >> >> >> >> >> >> >> www.ness.com >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> >> -- >> >> >> View this message in context: >> >> >> >> >> >> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.html >> >> >> Sent from the Lucene - Java Users mailing list archive at >> Nabble.com. >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> > >> >> > >> >> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java >> >> http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java >> >> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java >> >> http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.html >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> > >> > >> http://www.nabble.com/file/p23211629/AnalysisDemo.java AnalysisDemo.java >> -- >> View this message in context: >> http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23211629.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > -- View this message in context: http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23254287.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org