[
https://issues.apache.org/jira/browse/LUCENE-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-3359.
---------------------------------
Resolution: Not A Problem
in bytes, the average length of a chinese word exceeds even english because cjk
ideographs are encoded as multibyte sequences in utf8. so front coding is very
helpful.
> Option for no Front Encoding of term compression
> ------------------------------------------------
>
> Key: LUCENE-3359
> URL: https://issues.apache.org/jira/browse/LUCENE-3359
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 3.3
> Reporter: Gang Luo
> Priority: Minor
> Labels: Encoding, Front, compression
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Average length of a word in the English language is 5.1 , so Front Encoding
> of term compression in index is meaningful. But average length of a word in
> the Chinese language is 2.3. No need Front Encoding for chinese document
> index?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]