Hi Veit:

Thanks for your reply.

Do you know if there is anyone using CJKTokenizer?

Here is the code, no parsing error that I can see:

    Directory* store = NULL;
    store = _CLNEW RAMDirectory();
    const TCHAR* contentField = _T("contents");
    const TCHAR* test = _T("这里我们所论及的是有儿童参加的最初级足球协会red green");
    const TCHAR* cjk = _T("cjk");
    Document doc;

    IndexWriter* writer = NULL;
    IndexReader* reader = NULL;

    LanguageBasedAnalyzer* an = _CLNEW LanguageBasedAnalyzer(cjk, false);

    Query* q2 = NULL;
    Hits* h2 = NULL;
    const TCHAR* qry2 = _T("提");

    writer = _CLNEW IndexWriter(store, an, true);

    doc.add(*_CLNEW Field(contentField, test, Field::STORE_YES | 
Field::INDEX_TOKENIZED));

    writer->addDocument(&doc);
    writer->optimize();

    // Close and clean up
    writer->close();
    _CLLDELETE(writer);

    // verify the result
    reader =  IndexReader::open(store);
    IndexSearcher search(reader);

    q2 = QueryParser::parse(qry2 , contentField, an);
    if ( q2 != NULL )
    {
        h2 = search.search( q2 );

        size_t a = h2->length();

        printf("second query is %d", (int)a);
    }

    reader->close();


From: Veit Jahns [mailto:nuncupa...@googlemail.com]
Sent: Tuesday, 28 February 2012 12:12 AM
To: clucene-developers@lists.sourceforge.net
Subject: Re: [CLucene-dev] help ! - using LanguageBasedAnalyzer/CJKTokenizer 
returns wrong result

Hi Vivien,

maybe something is wrong with parsing the query. Is your query parse correctly? 
What parsed query do you get from the QueryParser?

Kind regards,

Veit
2012/2/26 Vivien Meng <v.m...@qsr.com.au<mailto:v.m...@qsr.com.au>>
Hi:


I make an instance of class by doing the following:
LanguageBasedAnalyzer* analyser = new LanguageBasedAnalyzer(“cjk”, false);

With the parameter “ckj”, I’d like to use the CJKTokenizer to deal with Chinese 
characters.
But, the search result is not correct:

Say if I have indexed the following contents:
这里我们所论及的是有儿童参加的最初级足球协会

Then, I’d like to search for the word 提 (obviously, it is not in the indexed 
contents), I am expecting the query hits should return 0, but the actual result 
is 1.

Anybody has any idea if CJKTokenizer works?  If not, is there any 
analyser/tokenizer that can deal with Chinese properly?

P.S. I am using CLucene in a Xcode project.


Thanks in advance.



Vivien Meng  Software Developer
QSR International Pty Ltd
2nd Floor, 651 Doncaster Road   |   Doncaster Victoria 3108 Australia
T  +61 3 9840 1100<tel:%2B61%203%209840%201100>  F  +61 3 9840 
1500<tel:%2B61%203%209840%201500>
v.m...@qsrinternational.com<mailto:v.m...@qsrinternational.com>   |   
www.qsrinternational.com<http://www.qsrinternational.com>

[Description: 
C:\Users\VMeng\AppData\Roaming\Microsoft\Signatures\qsr_logo_signature.gif]

Please consider the environment before printing this email.

  [Description: 
C:\Users\VMeng\AppData\Roaming\Microsoft\Signatures\QSR_enviro-graphic.gif]


________________________________

Disclaimer
This transmission may contain information which is confidential and privileged 
and intended only for the addressee. If you are not the addressee you may not 
use, disseminate or copy this information. If you have received this 
information in error please notify the sender immediately. Thank you.



------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net<mailto:CLucene-developers@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/clucene-developers

<<inline: image001.gif>>

<<inline: image002.png>>

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to