Rui, which encoding do you use internally before you give it to CLucene? Maybe you use an encoding different to the encoding expected by CLucene.
Kind regards, Veit 2010/4/26 Rui Oliveira <[email protected]>: > Hi, > > I have been using luke to analyze index. > > Well, all Portuguese characters appear replaced by an strange character. > > What I can do to avoid this? > It is not possible make clucene working with Portuguese characters? > > Thanks & Regards, > Rui > > > >> Date: Fri, 23 Apr 2010 20:43:49 +0200 >> From: [email protected] >> To: [email protected] >> Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> >> I suggest using a program called luke (google it). You can then look >> into the index and see what is indexed. Let us know if u see all the >> words you would expect to see. And see if u can find the document if u >> search from luke >> >> handy program :) >> >> cheers >> ben >> >> On Friday, April 23, 2010, Rui Oliveira <[email protected]> wrote: >> > >> > >> > >> > >> > >> > Itamar, >> > >> > The test results are made all them in same file. The same file have >> > "orçamento" and "administração" and found "administração" and do not found >> > "orçamento". >> > >> > The results are the same for a file in ANSI, Unicode or UTF8 encoded. >> > The problem is not loading files because I debug the text loaded from file >> > and this text are ok. >> > >> > Rui >> > >> > >> > >> > >> > From: [email protected] >> > To: [email protected] >> > Date: Fri, 23 Apr 2010 17:59:27 +0300 >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> > >> > Rui, >> > >> > This file is ANSI encoded. Are the other files you do succeed in finding >> > are Unicode / UTF8 encoded perhaps? If that's the case your routine for >> > loading the files is buggy. You should either have them all encoded using >> > the same encoding, or have more intelligent code to convert incompatible >> > encoding. >> > >> > HTH >> > >> > Itamar. >> > >> > >> > From: Rui Oliveira [mailto:[email protected]] >> > Sent: Friday, April 23, 2010 4:32 PM >> > To: clucene-developers; [email protected] >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> > >> > >> > I just attach the file. >> > >> > Tks, Rui >> > >> > >> > From: [email protected] >> > Date: Fri, 23 Apr 2010 09:22:05 -0400 >> > To: [email protected] >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> > >> > Can you send me this file that has both "orçamento" and administração? >> > >> > Or you can do a test: Open the file and delete the ç form orçamento and >> > administração. >> > And then type ç again. >> > >> > Index again and try to search both words again. >> > >> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]> >> > wrote: >> > >> > They are text file (*.txt) and both words are in same document. >> > When I search for "orçamento" don't found anything and when I search for >> > "administração" the document is found. >> > >> > >> > Rui >> > >> > >> > From: [email protected] >> > Date: Fri, 23 Apr 2010 09:09:30 -0400 >> > >> > >> > >> > To: [email protected] >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> > >> > Seems like an encoding problem with these documents. Are they html >> > pages? >> > Are the words "orçamento" and "administração" in the same page? for >> > example? >> > >> > Can you dump one of these files here? (One that has the problem and one >> > that has not) >> > >> > >> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]> >> > wrote: >> > >> > I am indexing some separated documents. >> > >> > The document that have these words are a small text document. This >> > document is indexed without any visible error. This same document is found >> > when I search for other words on it. >> > >> > >> > Rui >> > >> > >> > From: [email protected] >> > Date: Fri, 23 Apr 2010 08:58:05 -0400 >> > >> > >> > >> > To: [email protected] >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words >> > >> > What are you indexing? >> > >> > Just a big document? >> > Or a lot of sepparate documents ? (html documents?) >> > >> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]> >> > wrote: >> > >> > Hi Onilton, >> > >> > I have tested with "orcamento" instead of "orçamento" and didn't get >> > anything. >> > >> > I do not know if lucene indexes "orçamento" in a wrong way, because >> > indexes without any error, but when I search for it do not get anything. >> > >> > Thnaks & Regards, >> > Rui >> > >> > >> > From: >> > >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> CLucene-developers mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ________________________________ > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. > Learn more. > ------------------------------------------------------------------------------ > > _______________________________________________ > CLucene-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------------------------------------------------------ _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers
