While indexing Turkish web pages, "Parse Aborted: Lexical error...." occurs ---------------------------------------------------------------------------
Key: LUCENE-2246 URL: https://issues.apache.org/jira/browse/LUCENE-2246 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0 Reporter: Selim Nadi When I try to index Turkish page if there is a Turkish specific character in the HTML specific tag HTML parser gives "Parse Aborted: Lexical error.on ... line" error. For this case "<IMG SRC="../images/head.jpg" WIDTH=570 HEIGHT=47 BORDER=0 ALT="ş">" exception address "ş" character (which has 351 ascii value) as an error. OR ı character in title tag. <a title="(ııı)"> Turkish character in the content do not create any problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org