OK, this is a much different problem than you were originally asking about, effectively "how to index/search mixed language documents".
This topic has been discussed multiple times on the user list, I think your first step should be to search the archive. I *was* going to find the old searchable mail archive, but those clever folks at Lucid Imagination have something new, see: http://www.lucidimagination.com/search/p:lucene?q=multiple+languages Once you've had a chance to look that over I think you'll be off and running. Best Erick On Thu, Apr 23, 2009 at 1:43 AM, uday kumar maddigatla <[email protected]>wrote: > > HI > > Here are the details about my goals. > 1. I want to use this lucene for mixed languages. > 2. I want to make indexes of the documents which are either english or > danish etc. > I'm attaching my IndexFiles.java file. > > When i'm searching i'm giving the index path location as well as doucmets > folder. > > If i use StandardAnalyzer as an argument to IndexWriter's method it is able > to search the english characters. > > How can i use DutchAnalyzer in order to make this IndexFiles.java to index > the danish elements. > > In my Code which i attached, you can see 'C:\test3'. This is my location > where i want to store my indexes. > > I'm giving documents folder location as comand line argument. > > In my document the content will be like this > > <com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga skal > opsplittes > hhv. byggeplads og skat > Vedr. : Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg > Bygning B > SES Journal nr. : 42895-0001 > SES Navision nr.: Navision 9800124 > SES Ansvarlig : Martin Krøldrup Nielsen > SES rådgiver : Friis & Moltke A/S > Hermed fremsendes faktura på ekstra tømrerarbejde. > Byggeplads Amtsgården B-4 > jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note> > > i"m searching the word like rådgiver . When i see the result it is clearly > searching for r dgiver. It is omitting the danish element. > > Please help me in this. > > > > Erick Erickson wrote: > > > > Are you *also* using the DutchAnalyzer for your *query*? > > > > Please show us the index and search code (simplified as much > > as possible), then we'll be able to provide better suggestions. > > > > Also, tell us a bit more about your goals here. Is this an > > index entirely of Dutch documents? Or is it a mixed-language > > index? > > > > Think about getting a copy of Luke and > > 1> examining your index to see what's *really* there > > 2> examining the effects of using different parsers on > > your *query*. > > > > Best > > Erick > > > > On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla > > <[email protected]>wrote: > > > >> > >> Hi > >> > >> Thanks for your reply. > >> > >> I'm able to see the DutchAnalyzer. > >> > >> When i'm indexing my documents i given instace of DutchAnalyzer as an > >> argument to IndexWriter Class. > >> > >> After this when i search for the > >> http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java > >> contains the danish elements .. Still it is not able to identify. > >> > >> Please tell me how to use DutchAnalzer in my application. Sample example > >> or > >> series of steps helps me. > >> > >> I also attached my index file(.java file). > >> > >> Please help me in this. please.. > >> > >> Erick Erickson wrote: > >> > > >> > Take a look at DutchAnalyzer. The problem you'll have is if you're > >> > indexing > >> > this document along with a bunch of documents from other languages. > >> > You could search the mail archive for extensive discussions of > >> indexing/ > >> > searching documents from several languages. > >> > > >> > Best > >> > Erick > >> > > >> > On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla > >> > <[email protected]>wrote: > >> > > >> >> HI, > >> >> > >> >> > >> >> > >> >> I'm new to the lucene. I downloaded lucene 2.4.1. > >> >> > >> >> > >> >> > >> >> I have one xml file which contains few special characters like 'å', > >> 'ø,' > >> >> °' > >> >> etc.(these are Danish language elements). > >> >> > >> >> > >> >> > >> >> How can I search these things. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> Uday Kumar Reddy Maddigatla > >> >> > >> >> Software Engineer(Progrator|gatetrade) > >> >> > >> >> MACH India(Operations) > >> >> > >> >> Mobile: + 91-9963000377 > >> >> > >> >> [email protected] <mailto:[email protected]> > >> >> > >> >> [email protected] <mailto:[email protected]> > >> >> > >> >> www.ness.com > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.html > >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > > > http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java > http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java > http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java > http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java > -- > View this message in context: > http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
