How can I check this?
I just get text from files to a CString, and after this put them in CLucene.
Apparently, the text I get from file to CString it is right, I have checked in
degub mode and looks good.
Rui
> Date: Mon, 26 Apr 2010 14:44:56 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
>
> Rui,
>
> which encoding do you use internally before you give it to CLucene?
> Maybe you use an encoding different to the encoding expected by
> CLucene.
>
> Kind regards,
>
> Veit
>
> 2010/4/26 Rui Oliveira <[email protected]>:
> > Hi,
> >
> > I have been using luke to analyze index.
> >
> > Well, all Portuguese characters appear replaced by an strange character.
> >
> > What I can do to avoid this?
> > It is not possible make clucene working with Portuguese characters?
> >
> > Thanks & Regards,
> > Rui
> >
> >
> >
> >> Date: Fri, 23 Apr 2010 20:43:49 +0200
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >>
> >> I suggest using a program called luke (google it). You can then look
> >> into the index and see what is indexed. Let us know if u see all the
> >> words you would expect to see. And see if u can find the document if u
> >> search from luke
> >>
> >> handy program :)
> >>
> >> cheers
> >> ben
> >>
> >> On Friday, April 23, 2010, Rui Oliveira <[email protected]> wrote:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Itamar,
> >> >
> >> > The test results are made all them in same file. The same file have
> >> > "orçamento" and "administração" and found "administração" and do not
> >> > found
> >> > "orçamento".
> >> >
> >> > The results are the same for a file in ANSI, Unicode or UTF8 encoded.
> >> > The problem is not loading files because I debug the text loaded from
> >> > file
> >> > and this text are ok.
> >> >
> >> > Rui
> >> >
> >> >
> >> >
> >> >
> >> > From: [email protected]
> >> > To: [email protected]
> >> > Date: Fri, 23 Apr 2010 17:59:27 +0300
> >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >> >
> >> > Rui,
> >> >
> >> > This file is ANSI encoded. Are the other files you do succeed in finding
> >> > are Unicode / UTF8 encoded perhaps? If that's the case your routine for
> >> > loading the files is buggy. You should either have them all encoded using
> >> > the same encoding, or have more intelligent code to convert incompatible
> >> > encoding.
> >> >
> >> > HTH
> >> >
> >> > Itamar.
> >> >
> >> >
> >> > From: Rui Oliveira [mailto:[email protected]]
> >> > Sent: Friday, April 23, 2010 4:32 PM
> >> > To: clucene-developers; [email protected]
> >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >> >
> >> >
> >> > I just attach the file.
> >> >
> >> > Tks, Rui
> >> >
> >> >
> >> > From: [email protected]
> >> > Date: Fri, 23 Apr 2010 09:22:05 -0400
> >> > To: [email protected]
> >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >> >
> >> > Can you send me this file that has both "orçamento" and administração?
> >> >
> >> > Or you can do a test: Open the file and delete the ç form orçamento and
> >> > administração.
> >> > And then type ç again.
> >> >
> >> > Index again and try to search both words again.
> >> >
> >> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira <[email protected]>
> >> > wrote:
> >> >
> >> > They are text file (*.txt) and both words are in same document.
> >> > When I search for "orçamento" don't found anything and when I search for
> >> > "administração" the document is found.
> >> >
> >> >
> >> > Rui
> >> >
> >> >
> >> > From: [email protected]
> >> > Date: Fri, 23 Apr 2010 09:09:30 -0400
> >> >
> >> >
> >> >
> >> > To: [email protected]
> >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >> >
> >> > Seems like an encoding problem with these documents. Are they html
> >> > pages?
> >> > Are the words "orçamento" and "administração" in the same page? for
> >> > example?
> >> >
> >> > Can you dump one of these files here? (One that has the problem and one
> >> > that has not)
> >> >
> >> >
> >> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira <[email protected]>
> >> > wrote:
> >> >
> >> > I am indexing some separated documents.
> >> >
> >> > The document that have these words are a small text document. This
> >> > document is indexed without any visible error. This same document is
> >> > found
> >> > when I search for other words on it.
> >> >
> >> >
> >> > Rui
> >> >
> >> >
> >> > From: [email protected]
> >> > Date: Fri, 23 Apr 2010 08:58:05 -0400
> >> >
> >> >
> >> >
> >> > To: [email protected]
> >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words
> >> >
> >> > What are you indexing?
> >> >
> >> > Just a big document?
> >> > Or a lot of sepparate documents ? (html documents?)
> >> >
> >> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira <[email protected]>
> >> > wrote:
> >> >
> >> > Hi Onilton,
> >> >
> >> > I have tested with "orcamento" instead of "orçamento" and didn't get
> >> > anything.
> >> >
> >> > I do not know if lucene indexes "orçamento" in a wrong way, because
> >> > indexes without any error, but when I search for it do not get anything.
> >> >
> >> > Thnaks & Regards,
> >> > Rui
> >> >
> >> >
> >> > From:
> >> >
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> _______________________________________________
> >> CLucene-developers mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> >
> > ________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> > Learn more.
> > ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > CLucene-developers mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> >
> >
>
> ------------------------------------------------------------------------------
> _______________________________________________
> CLucene-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5------------------------------------------------------------------------------
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers