There are no unicode character in the java sources so that didn't make any difference...
I'm suspecting subversion now: the stemsUnicode.txt and wordsUnicode.txt files are encoded in UTF-16 (they have the proper two byte byte-order prefix) and have property svn:eol-style set to native. On my (Windows :( )system the files are 904424 and 1101164 bytes long and are full of "0d 0a 00" byte sequences which in unicode should probably just be "0a 00" or "0d 00 0a 00". On Mac the "0a" sequences won't be touched by svn. Is there a way to do a svn update --raw or something that I can check this? If this is indeed the problem, a possible fix would be to set the svn:eol-style to LF or else let svn know that the file is in unicode (perhaps setting the svn:mime-type property to something else than the default?) Luc -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: vrijdag 11 februari 2005 16:33 To: Lucene Developers List Subject: Re: TestCase for KeywordAnalyzer split into KeywordTokenizer/KeywordAnalyzer On Feb 11, 2005, at 9:04 AM, Vanlerberghe, Luc wrote: > Here's the diff for the TestCase 'inline'. > It should be applied in > contrib\analyzers\src\test\org\apache\lucene\analysis > > The failure in the Russian Analyzer is unrelated (I updated all > sources to HEAD i.e. 153399 to be sure) but you probably need the > Russian fonts to see the error: unicode expected:<?????????> but > was:<???????????> My guess is it's a file encoding issue on your system. The files should be in UTF8 encoding. The build file has a parameter you can adjust: ant -Dbuild.encoding=utf-8 All is well for me running on Mac OS X with a fresh Subversion checkout. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]