Rui,

which encoding do you use internally before you give it to CLucene?
Maybe you use an encoding different to the encoding expected by
CLucene.

Kind regards,

Veit

2010/4/26 Rui Oliveira <[email protected]>:
> Hi,
>
> I have been using luke to analyze index.
>
> Well, all Portuguese characters appear replaced by an strange character.
>
> What I can do to avoid this?
> It is not possible make clucene working with Portuguese characters?
>
> Thanks & Regards,
> Rui
>
>
>
>> Date: Fri, 23 Apr 2010 20:43:49 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>>
>> I suggest using a program called luke (google it). You can then look
>> into the index and see what is indexed. Let us know if u see all the
>> words you would expect to see. And see if u can find the document if u
>> search from luke
>>
>> handy program :)
>>
>> cheers
>> ben
>>
>> On Friday, April 23, 2010, Rui Oliveira <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> >
>> > Itamar,
>> >
>> > The test results are made all them in same file. The same file have
>> > "orçamento" and "administração" and found "administração" and do not found
>> > "orçamento".
>> >
>> > The results are the same for a file in ANSI, Unicode or UTF8 encoded.
>> > The problem is not loading files because I debug the text loaded from file
>> > and this text are ok.
>> >
>> > Rui
>> >
>> >
>> >
>> >
>> > From: [email protected]
>> > To: [email protected]
>> > Date: Fri, 23 Apr 2010 17:59:27 +0300
>> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>> >
>> > Rui,
>> >
>> > This file is ANSI encoded. Are the other files you do succeed in finding
>> > are Unicode / UTF8 encoded perhaps? If that's the case your routine for
>> > loading the files is buggy. You should either have them all encoded using
>> > the same encoding, or have more intelligent code to convert incompatible
>> > encoding.
>> >
>> > HTH
>> >
>> > Itamar.
>> >
>> >
>> > From: Rui Oliveira [mailto:[email protected]]
>> > Sent: Friday, April 23, 2010 4:32 PM
>> > To: clucene-developers; [email protected]
>> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>> >
>> >
>> > I just attach the file.
>> >
>> > Tks, Rui
>> >
>> >
>> > From: [email protected]
>> > Date: Fri, 23 Apr 2010 09:22:05 -0400
>> > To: [email protected]
>> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>> >
>> > Can you send me this file that has both "orçamento" and administração?
>> >
>> > Or you can do a test: Open the file and delete the ç form orçamento and
>> > administração.
>> > And then type ç again.
>> >
>> > Index again and try to search both words again.
>> >
>> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]>
>> > wrote:
>> >
>> > They are text file (*.txt) and both words are in same document.
>> > When I search for "orçamento" don't found anything and when I search for
>> > "administração" the document is found.
>> >
>> >
>> > Rui
>> >
>> >
>> > From: [email protected]
>> > Date: Fri, 23 Apr 2010 09:09:30 -0400
>> >
>> >
>> >
>> > To: [email protected]
>> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>> >
>> > Seems like an encoding problem with these documents. Are they html
>> > pages?
>> > Are the words "orçamento" and "administração" in the same page? for
>> > example?
>> >
>> > Can you dump one of these files here? (One that has the problem and one
>> > that has not)
>> >
>> >
>> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]>
>> > wrote:
>> >
>> > I am indexing some separated documents.
>> >
>> > The document that have these words are a small text document. This
>> > document is indexed without any visible error. This same document is found
>> > when I search for other words on it.
>> >
>> >
>> > Rui
>> >
>> >
>> > From: [email protected]
>> > Date: Fri, 23 Apr 2010 08:58:05 -0400
>> >
>> >
>> >
>> > To: [email protected]
>> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>> >
>> > What are you indexing?
>> >
>> > Just a big document?
>> > Or a lot of sepparate documents ? (html documents?)
>> >
>> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]>
>> > wrote:
>> >
>> > Hi Onilton,
>> >
>> > I have tested with "orcamento" instead of "orçamento" and didn't get
>> > anything.
>> >
>> > I do not know if lucene indexes "orçamento" in a wrong way, because
>> > indexes without any error, but when I search for it do not get anything.
>> >
>> > Thnaks & Regards,
>> > Rui
>> >
>> >
>> > From:
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> CLucene-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
> ________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> Learn more.
> ------------------------------------------------------------------------------
>
> _______________________________________________
> CLucene-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
>
>

------------------------------------------------------------------------------
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to