cn ChineseTokenizer.java

otis Tue, 02 Mar 2004 05:56:02 -0800

otis        2004/03/02 05:56:03

  Modified:    contributions/analyzers/src/java/org/apache/lucene/analysis/cn
                        ChineseTokenizer.java
  Log:
  - Added documentation
  
  Revision  Changes    Path
  1.4       +18 -1     
jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn/ChineseTokenizer.java
  
  Index: ChineseTokenizer.java
  ===================================================================
  RCS file: 
/home/cvs/jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn/ChineseTokenizer.java,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- ChineseTokenizer.java     22 Jan 2004 20:54:47 -0000      1.3
  +++ ChineseTokenizer.java     2 Mar 2004 13:56:03 -0000       1.4
  @@ -64,6 +64,23 @@
    *              Rule: A Chinese character as a single token
    * Copyright:   Copyright (c) 2001
    * Company:
  + *
  + * The difference between thr ChineseTokenizer and the
  + * CJKTokenizer (id=23545) is that they have different
  + * token parsing logic.
  + * 
  + * Let me use an example. If having a Chinese text
  + * "C1C2C3C4" to be indexed, the tokens returned from the
  + * ChineseTokenizer are C1, C2, C3, C4. And the tokens
  + * returned from the CJKTokenizer are C1C2, C2C3, C3C4.
  + *
  + * Therefore the index the CJKTokenizer created is much
  + * larger.
  + *
  + * The problem is that when searching for C1, C1C2, C1C3,
  + * C4C2, C1C2C3 ... the ChineseTokenizer works, but the
  + * CJKTokenizer will not work.
  + *
    * @author Yiyi Sun
    * @version 1.0
    *
  @@ -149,4 +166,4 @@
           }
   
       }
  -}
  \ No newline at end of file
  +}


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn ChineseTokenizer.java

Reply via email to