DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=23650>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=23650 ------- Additional Comments From [EMAIL PROTECTED] 2006-03-03 00:16 ------- This is also happening in the perl port of lucene named plucene (at time of writing latest version is 1.24 through CPAN). I have tracked this down to that it depends on which characters are being allowed by the tokenizer. If I use WhitespaceAnalyzer (since I want to cover swedish chars and other chars which are not a-z as the SimpleAnalyzer uses) the default value of the WhitespaceTokenizer (which is being used by the WhitespaceAnalyzer) is: sub token_re { qr/\S+/ } When using the default tokenizer above the indexing will fail with an error similar to: Docs out of order (44 < 53) at /usr/local/share/perl/5.8.4/Plucene/Index/SegmentMerger.pm line 149. But when changing the token_re function into: sub token_re { qr/[a-z\d]+/ } which will only allow a-z and 0-9 the indexing has no problems what so ever (at least I dont get the above error message). But when adding swedish chars for the tokenizer such as: sub token_re { qr/[a-zåäö\d]+/ } the error of "Docs out of order" returns... Kind Regards Apachez -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]