Rui, This file is ANSI encoded. Are the other files you do succeed in finding are Unicode / UTF8 encoded perhaps? If that's the case your routine for loading the files is buggy. You should either have them all encoded using the same encoding, or have more intelligent code to convert incompatible encoding. HTH Itamar.
_____ From: Rui Oliveira [mailto:[email protected]] Sent: Friday, April 23, 2010 4:32 PM To: clucene-developers; [email protected] Subject: Re: [CLucene-dev] Clucene search - Do not found some words I just attach the file. Tks, Rui _____ From: [email protected] Date: Fri, 23 Apr 2010 09:22:05 -0400 To: [email protected] Subject: Re: [CLucene-dev] Clucene search - Do not found some words Can you send me this file that has both "orçamento" and administração? Or you can do a test: Open the file and delete the ç form orçamento and administração. And then type ç again. Index again and try to search both words again. On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]> wrote: They are text file (*.txt) and both words are in same document. When I search for "orçamento" don't found anything and when I search for "administração" the document is found. Rui _____ From: [email protected] Date: Fri, 23 Apr 2010 09:09:30 -0400 To: [email protected] Subject: Re: [CLucene-dev] Clucene search - Do not found some words Seems like an encoding problem with these documents. Are they html pages? Are the words "orçamento" and "administração" in the same page? for example? Can you dump one of these files here? (One that has the problem and one that has not) On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]> wrote: I am indexing some separated documents. The document that have these words are a small text document. This document is indexed without any visible error. This same document is found when I search for other words on it. Rui _____ From: [email protected] Date: Fri, 23 Apr 2010 08:58:05 -0400 To: [email protected] Subject: Re: [CLucene-dev] Clucene search - Do not found some words What are you indexing? Just a big document? Or a lot of sepparate documents ? (html documents?) On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]> wrote: Hi Onilton, I have tested with "orcamento" instead of "orçamento" and didn't get anything. I do not know if lucene indexes "orçamento" in a wrong way, because indexes without any error, but when I search for it do not get anything. Thnaks & Regards, Rui _____ From: [email protected] Date: Fri, 23 Apr 2010 08:09:20 -0400 To: [email protected] Subject: Re: [CLucene-dev] Clucene search - Do not found some words If "importação" works "orçamento" should work too. But I didn't get the problem. Clucene removes this kind of signs so you should get "orcamento" instead of "orçamento". Where is the problem happening exactly? It happens when you search for "orçamento" or Clucene indexes "orçamento" in a wrong way? On Fri, Apr 23, 2010 at 7:51 AM, Rui Oliveira <[email protected]> wrote: I am using clucene-core-0.9.21b, and lucene search do not found same portuguese words like "orçamento", "orçamentos" or "orça". But for other portuguese words with portuguese characters like "administração", "relações" or "importação" works well. What could be? Thanks & Regards, Rui _____ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get <http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID2 8326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5> busy. ---------------------------------------------------------------------------- -- _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers _____ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn <http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON: WL:en-US:WM_HMP:042010_1> more. ---------------------------------------------------------------------------- -- _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers _____ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. See <http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON: WL:en-US:WM_HMP:042010_2> how. ---------------------------------------------------------------------------- -- _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers _____ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get <http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID2 8326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5> busy. ---------------------------------------------------------------------------- -- _______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers _____ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy. <http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID2 8326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5>
------------------------------------------------------------------------------
_______________________________________________ CLucene-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/clucene-developers
