I suggest using a program called luke (google it). You can then look
into the index and see what is indexed. Let us know if u see all the
words you would expect to see. And see if u can find the document if u
search from luke

handy program :)

cheers
ben

On Friday, April 23, 2010, Rui Oliveira <[email protected]> wrote:
>
>
>
>
>
> Itamar,
>
> The test results are made all them in same file. The same file have 
> "orçamento" and "administração" and found "administração" and do not found 
> "orçamento".
>
> The results are the same for a file in ANSI, Unicode or UTF8 encoded. The 
> problem is not loading files because I debug the text loaded from file and 
> this text are ok.
>
> Rui
>
>
>
>
> From: [email protected]
> To: [email protected]
> Date: Fri, 23 Apr 2010 17:59:27 +0300
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
> Rui,
>
> This file is ANSI encoded. Are the other files you do succeed in finding are 
> Unicode / UTF8 encoded perhaps? If that's the case your routine for loading 
> the files is buggy. You should either have them all encoded using the same 
> encoding, or have more intelligent code to convert incompatible encoding.
>
> HTH
>
> Itamar.
>
>
> From: Rui Oliveira [mailto:[email protected]]
> Sent: Friday, April 23, 2010 4:32 PM
> To: clucene-developers; [email protected]
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
>
> I just attach the file.
>
> Tks, Rui
>
>
> From: [email protected]
> Date: Fri, 23 Apr 2010 09:22:05 -0400
> To: [email protected]
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
> Can you send me this file that has both "orçamento" and administração?
>
> Or you can do a test: Open the file and delete the ç form orçamento and 
> administração.
> And then type ç again.
>
> Index again and try to search both words again.
>
> On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]> wrote:
>
> They are text file (*.txt) and both words are in same document.
> When I search for "orçamento" don't found anything and when I search for 
> "administração" the document is found.
>
>
> Rui
>
>
> From: [email protected]
> Date: Fri, 23 Apr 2010 09:09:30 -0400
>
>
>
> To: [email protected]
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
> Seems like an encoding problem with these documents. Are they html pages?
> Are the words "orçamento" and "administração" in the same page? for example?
>
> Can you dump one of these files here? (One that has the problem and one that 
> has not)
>
>
> On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]> wrote:
>
> I am indexing some separated documents.
>
> The document that have these words are a small text document. This document 
> is indexed without any visible error. This same document is found when I 
> search for other words on it.
>
>
> Rui
>
>
> From: [email protected]
> Date: Fri, 23 Apr 2010 08:58:05 -0400
>
>
>
> To: [email protected]
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
> What are you indexing?
>
> Just a big document?
> Or a lot of sepparate documents ? (html documents?)
>
> On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]> wrote:
>
> Hi Onilton,
>
> I have tested with "orcamento" instead of "orçamento" and didn't get anything.
>
> I do not know if lucene indexes "orçamento" in a wrong way, because 
> indexes without any error, but when I search for it do not get anything.
>
> Thnaks & Regards,
> Rui
>
>
> From:
>

------------------------------------------------------------------------------
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to