I suggest using a program called luke (google it). You can then look into the index and see what is indexed. Let us know if u see all the words you would expect to see. And see if u can find the document if u search from luke
handy program :) cheers ben On Friday, April 23, 2010, Rui Oliveira <[email protected]> wrote: > > > > > > Itamar, > > The test results are made all them in same file. The same file have > "orçamento" and "administração" and found "administração" and do not found > "orçamento". > > The results are the same for a file in ANSI, Unicode or UTF8 encoded. The > problem is not loading files because I debug the text loaded from file and > this text are ok. > > Rui > > > > > From: [email protected] > To: [email protected] > Date: Fri, 23 Apr 2010 17:59:27 +0300 > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > Rui, > > This file is ANSI encoded. Are the other files you do succeed in finding are > Unicode / UTF8 encoded perhaps? If that's the case your routine for loading > the files is buggy. You should either have them all encoded using the same > encoding, or have more intelligent code to convert incompatible encoding. > > HTH > > Itamar. > > > From: Rui Oliveira [mailto:[email protected]] > Sent: Friday, April 23, 2010 4:32 PM > To: clucene-developers; [email protected] > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > > I just attach the file. > > Tks, Rui > > > From: [email protected] > Date: Fri, 23 Apr 2010 09:22:05 -0400 > To: [email protected] > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > Can you send me this file that has both "orçamento" and administração? > > Or you can do a test: Open the file and delete the ç form orçamento and > administração. > And then type ç again. > > Index again and try to search both words again. > > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]> wrote: > > They are text file (*.txt) and both words are in same document. > When I search for "orçamento" don't found anything and when I search for > "administração" the document is found. > > > Rui > > > From: [email protected] > Date: Fri, 23 Apr 2010 09:09:30 -0400 > > > > To: [email protected] > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > Seems like an encoding problem with these documents. Are they html pages? > Are the words "orçamento" and "administração" in the same page? for example? > > Can you dump one of these files here? (One that has the problem and one that > has not) > > > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]> wrote: > > I am indexing some separated documents. > > The document that have these words are a small text document. This document > is indexed without any visible error. This same document is found when I > search for other words on it. > > > Rui > > > From: [email protected] > Date: Fri, 23 Apr 2010 08:58:05 -0400 > > > > To: [email protected] > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > What are you indexing? > > Just a big document? > Or a lot of sepparate documents ? (html documents?) > > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]> wrote: > > Hi Onilton, > > I have tested with "orcamento" instead of "orçamento" and didn't get anything. > > I do not know if lucene indexes "orçamento" in a wrong way, because > indexes without any error, but when I search for it do not get anything. > > Thnaks & Regards, > Rui > > > From: > ------------------------------------------------------------------------------ _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers
