Hi, I want to include the support for Danish in the HTMLParser of Lucene. Workflow:
1) In the HTMLParser.jj I have added this to a token: < #LET: ["A"-"Z","a"-"z","0"-"9","�","�","�","�","�","�"] > 2) In the StandarTokenizer.jj I have added "\u0080"-"\u00ff", "\u0100"-"\u017f", "\u0180"-"\u024f", "\u00c0"-"\u00d6" to the "#LETTER:" tag When I search (after compiling and indexing) after words with special characters in it, the search engine can't find them. For example: when I search for "civilingeni�r", I will get no result. When I write out the comment string of the result (after another search of a word nearby), I see this in my browser: "civilingeniør". So somehow the parser is mapping the wrong characters.. What am I doing wrong? Thx.. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
