[ https://issues.apache.org/jira/browse/LUCENE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721457#action_12721457 ]
Robert Muir commented on LUCENE-1692: ------------------------------------- Michael, I think it would be nice to fix the Thai offset bug, so highlighter will work. this is a safe one-line fix and its an obvious error. The SmartChineseAnalyzer empty token bug is pretty serious, i think indexing empty tokens for every piece of punctuation could really hurt similarity computation (am i wrong, never tried?) The Thai .type() bug is something that could be fixed later, i don't think the token type being ALPHANUM versus NUM is really hurting anyone. The issue where DutchAnalyzer doesnt do what it claims, i think thats not really hurting anyone, and they can use the snowball version if they want accurate snowball behavior. I do think the huge files in DutchAnalyzer that aren't being used can be removed if you want to save 1MB, but I'm not sure how important that is. Let me know your thoughts. > Contrib analyzers need tests > ---------------------------- > > Key: LUCENE-1692 > URL: https://issues.apache.org/jira/browse/LUCENE-1692 > Project: Lucene - Java > Issue Type: Test > Components: contrib/analyzers > Reporter: Robert Muir > Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, > LUCENE-1692.txt > > > The analyzers in contrib need tests, preferably ones that test the behavior > of all the Token 'attributes' involved (offsets, type, etc) and not just what > they do with token text. > This way, they can be converted to the new api without breakage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org