CJKAnalyzer should convert half width katakana to full width katakana ---------------------------------------------------------------------
Key: LUCENE-1032 URL: https://issues.apache.org/jira/browse/LUCENE-1032 Project: Lucene - Java Issue Type: Improvement Reporter: Andrew Lynch Some of our Japanese customers are reporting errors when performing searches using half width characters. The desired behavior is that a document containing half width characters should be returned when performing a search using full width equivalents or when searching by the half width character itself. Currently, a search will not return any matches for half width characters. Here is a test case outlining desired behavior (this may require a new Analyzer). {code} public class TestJapaneseEncodings extends TestCase { byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB}; byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6}; public void testAnalyzerWithHalfWidth() throws IOException { Reader r1 = new StringReader(makeHalfWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); assertNotNull(stream); Token token = stream.next(); assertNotNull(token); assertEquals(makeFullWidthKa(), token.termText()); } public void testAnalyzerWithFullWidth() throws IOException { Reader r1 = new StringReader(makeFullWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); assertEquals(makeFullWidthKa(), stream.next().termText()); } private String makeFullWidthKa() throws UnsupportedEncodingException { return new String(fullWidthKa, "UTF-8"); } private String makeHalfWidthKa() throws UnsupportedEncodingException { return new String(halfWidthKa, "UTF-8"); } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]