ls_text shouldn't be TCHAR? (I'm asking other people reading this thread)
On Mon, Apr 26, 2010 at 9:58 AM, Rui Oliveira <ruifra...@hotmail.com> wrote: > void c_IndexEx::m_Add(CString avs_codRevsId) > > { > > CString ls_origem = "c_IndexEx::m_Add"; > > > > try > > { > > m_InitVariables(); > > > > if(!ii_enmIndx) > > return; > > > > IndexWriter* writer = NULL; > > lucene::analysis::standard::StandardAnalyzer an; > > > > if ( IndexReader::indexExists(iclp_indexPath) ){ > > if ( IndexReader::isLocked(iclp_indexPath) ) > > { > > m_AppendLog("Index was locked... unlocking > it."); > > > > IndexReader::unlock(iclp_indexPath); > > } > > > > writer = _CLNEW IndexWriter( iclp_indexPath, &an, > false); > > } > > else > > { > > writer = _CLNEW IndexWriter( iclp_indexPath ,&an, > true); > > } > > > writer->setMaxFieldLength(IndexWriter::DEFAULT_MAX_FIELD_LENGTH); > > writer->setUseCompoundFile(true); > > > > uint64_t str = lucene::util::Misc::currentTimeMillis(); > > > > // make a new, empty document > > Document* lcl_doc = _CLNEW Document(); > > if(m_FileDocument( avs_codRevsId, lcl_doc )) > > { > > writer->addDocument( lcl_doc ); > > } > > _CLDELETE(lcl_doc); > > > > writer->optimize(); > > writer->close(); > > _CLDELETE(writer); > > } > > catch(CLuceneError& err) > > { > > // e->Delete(); > > return; > > } > > catch( CException* e ) > > { > > // e->Delete(); > > m_AppendLog(ls_origem); > > return; > > } > > catch(...) > > { > > // e->Delete(); > > return; > > } > > } > > > > BOOL c_IndexEx::m_FileDocument(CString avs_codRevsId, Document* arcl_doc) > > { > > // make a new, empty document > > CString ls_codDocmId; > > CString ls_Path = m_GetFilePath(avs_codRevsId, &ls_codDocmId); > > if(ls_Path.IsEmpty()) > > { > > return FALSE; > > } > > char* lcl_Path = NULL; > > lcl_Path = new char[ls_Path.GetLength()+1]; > > _tcscpy(lcl_Path, ls_Path); > > > > CString ls_text; > > m_GetFileContents(lcl_Path, &ls_text); > > arcl_doc->add( *_CLNEW Field(_T("contents"), ls_text, > Field::STORE_YES | Field::INDEX_TOKENIZED) ); > > > > icl_file.m_DeleteFile(ls_Path); > > > > // return the document > > delete lcl_Path; > > return TRUE; > > } > > > > > ------------------------------ > From: oniltonmac...@gmail.com > Date: Mon, 26 Apr 2010 10:36:45 -0300 > > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > Can you send the code where you index? > > On Mon, Apr 26, 2010 at 9:55 AM, Rui Oliveira <ruifra...@hotmail.com>wrote: > > How can I check this? > > I just get text from files to a CString, and after this put them in > CLucene. > > Apparently, the text I get from file to CString it is right, I have checked > in degub mode and looks good. > > Rui > > > > > Date: Mon, 26 Apr 2010 14:44:56 +0200 > > From: nuncupa...@googlemail.com > > > To: clucene-developers@lists.sourceforge.net > > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > > > Rui, > > > > which encoding do you use internally before you give it to CLucene? > > Maybe you use an encoding different to the encoding expected by > > CLucene. > > > > Kind regards, > > > > Veit > > > > 2010/4/26 Rui Oliveira <ruifra...@hotmail.com>: > > > Hi, > > > > > > I have been using luke to analyze index. > > > > > > Well, all Portuguese characters appear replaced by an strange > character. > > > > > > What I can do to avoid this? > > > It is not possible make clucene working with Portuguese characters? > > > > > > Thanks & Regards, > > > Rui > > > > > > > > > > > >> Date: Fri, 23 Apr 2010 20:43:49 +0200 > > >> From: bvanklin...@gmail.com > > >> To: clucene-developers@lists.sourceforge.net > > >> Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > >> I suggest using a program called luke (google it). You can then look > > >> into the index and see what is indexed. Let us know if u see all the > > >> words you would expect to see. And see if u can find the document if u > > >> search from luke > > >> > > >> handy program :) > > >> > > >> cheers > > >> ben > > >> > > >> On Friday, April 23, 2010, Rui Oliveira <ruifra...@hotmail.com> > wrote: > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > Itamar, > > >> > > > >> > The test results are made all them in same file. The same file have > > >> > "orçamento" and "administração" and found "administração" and do not > found > > >> > "orçamento". > > >> > > > >> > The results are the same for a file in ANSI, Unicode or UTF8 > encoded. > > >> > The problem is not loading files because I debug the text loaded > from file > > >> > and this text are ok. > > >> > > > >> > Rui > > >> > > > >> > > > >> > > > >> > > > >> > From: ita...@divrei-tora.com > > >> > To: clucene-developers@lists.sourceforge.net > > >> > Date: Fri, 23 Apr 2010 17:59:27 +0300 > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Rui, > > >> > > > >> > This file is ANSI encoded. Are the other files you do succeed in > finding > > >> > are Unicode / UTF8 encoded perhaps? If that's the case your routine > for > > >> > loading the files is buggy. You should either have them all encoded > using > > >> > the same encoding, or have more intelligent code to convert > incompatible > > >> > encoding. > > >> > > > >> > HTH > > >> > > > >> > Itamar. > > >> > > > >> > > > >> > From: Rui Oliveira [mailto:ruifra...@hotmail.com] > > >> > Sent: Friday, April 23, 2010 4:32 PM > > >> > To: clucene-developers; oniltonmac...@gmail.com > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > > > >> > I just attach the file. > > >> > > > >> > Tks, Rui > > >> > > > >> > > > >> > From: oniltonmac...@gmail.com > > >> > Date: Fri, 23 Apr 2010 09:22:05 -0400 > > >> > To: clucene-developers@lists.sourceforge.net > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Can you send me this file that has both "orçamento" and > administração? > > >> > > > >> > Or you can do a test: Open the file and delete the ç form orçamento > and > > >> > administração. > > >> > And then type ç again. > > >> > > > >> > Index again and try to search both words again. > > >> > > > >> > On Fri, Apr 23, 2010 at 9:14 AM, Rui Oliveira < > ruifra...@hotmail.com> > > >> > wrote: > > >> > > > >> > They are text file (*.txt) and both words are in same document. > > >> > When I search for "orçamento" don't found anything and when I search > for > > >> > "administração" the document is found. > > >> > > > >> > > > >> > Rui > > >> > > > >> > > > >> > From: oniltonmac...@gmail.com > > >> > Date: Fri, 23 Apr 2010 09:09:30 -0400 > > >> > > > >> > > > >> > > > >> > To: clucene-developers@lists.sourceforge.net > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > Seems like an encoding problem with these documents. Are they html > > >> > pages? > > >> > Are the words "orçamento" and "administração" in the same page? for > > >> > example? > > >> > > > >> > Can you dump one of these files here? (One that has the problem and > one > > >> > that has not) > > >> > > > >> > > > >> > On Fri, Apr 23, 2010 at 9:05 AM, Rui Oliveira < > ruifra...@hotmail.com> > > >> > wrote: > > >> > > > >> > I am indexing some separated documents. > > >> > > > >> > The document that have these words are a small text document. This > > >> > document is indexed without any visible error. This same document is > found > > >> > when I search for other words on it. > > >> > > > >> > > > >> > Rui > > >> > > > >> > > > >> > From: oniltonmac...@gmail.com > > >> > Date: Fri, 23 Apr 2010 08:58:05 -0400 > > >> > > > >> > > > >> > > > >> > To: clucene-developers@lists.sourceforge.net > > >> > Subject: Re: [CLucene-dev] Clucene search - Do not found some words > > >> > > > >> > What are you indexing? > > >> > > > >> > Just a big document? > > >> > Or a lot of sepparate documents ? (html documents?) > > >> > > > >> > On Fri, Apr 23, 2010 at 8:54 AM, Rui Oliveira < > ruifra...@hotmail.com> > > >> > wrote: > > >> > > > >> > Hi Onilton, > > >> > > > >> > I have tested with "orcamento" instead of "orçamento" and didn't get > > >> > anything. > > >> > > > >> > I do not know if lucene indexes "orçamento" in a wrong way, because > > >> > indexes without any error, but when I search for it do not get > anything. > > >> > > > >> > Thnaks & Regards, > > >> > Rui > > >> > > > >> > > > >> > From: > > >> > > > >> > > >> > > >> > ------------------------------------------------------------------------------ > > >> _______________________________________________ > > >> CLucene-developers mailing list > > >> CLucene-developers@lists.sourceforge.net > > >> https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > ________________________________ > > > Hotmail has tools for the New Busy. Search, chat and e-mail from your > inbox. > > > Learn more. > > > > ------------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > CLucene-developers mailing list > > > CLucene-developers@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > CLucene-developers mailing list > > CLucene-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > ------------------------------ > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with > Hotmail. Get > busy.<http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > ------------------------------ > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with > Hotmail. Get > busy.<http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > >
------------------------------------------------------------------------------
_______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers