[
https://issues.apache.org/jira/browse/LUCENE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577309#action_12577309
]
Hiroaki Kawai commented on LUCENE-1032:
---------------------------------------
I think this feature should merged to
https://issues.apache.org/jira/browse/LUCENE-1215
Unicode compatibility decomposition will fix this issue. :-)
> CJKAnalyzer should convert half width katakana to full width katakana
> ---------------------------------------------------------------------
>
> Key: LUCENE-1032
> URL: https://issues.apache.org/jira/browse/LUCENE-1032
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 2.0.0
> Reporter: Andrew Lynch
>
> Some of our Japanese customers are reporting errors when performing searches
> using half width characters.
> The desired behavior is that a document containing half width characters
> should be returned when performing a search using full width equivalents or
> when searching by the half width character itself.
> Currently, a search will not return any matches for half width characters.
> Here is a test case outlining desired behavior (this may require a new
> Analyzer).
> {code}
> public class TestJapaneseEncodings extends TestCase
> {
> byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB};
> byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6};
> public void testAnalyzerWithHalfWidth() throws IOException
> {
> Reader r1 = new StringReader(makeHalfWidthKa());
> TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1);
> assertNotNull(stream);
> Token token = stream.next();
> assertNotNull(token);
> assertEquals(makeFullWidthKa(), token.termText());
> }
> public void testAnalyzerWithFullWidth() throws IOException
> {
> Reader r1 = new StringReader(makeFullWidthKa());
> TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1);
> assertEquals(makeFullWidthKa(), stream.next().termText());
> }
> private String makeFullWidthKa() throws UnsupportedEncodingException
> {
> return new String(fullWidthKa, "UTF-8");
> }
> private String makeHalfWidthKa() throws UnsupportedEncodingException
> {
> return new String(halfWidthKa, "UTF-8");
> }
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]