Hi Hilko, Hilko Bengen <[email protected]> writes: > BTW, I just tried passing 'äöü' as a Latin1-encoded string (bytes e4 f6 > fc) to csearch. This led to regexp/syntax failing with an "invalid > UTF-8" error, so this does not work, even if the character encoding of > the search term matches that of the index. Yep, that is what I suspected. Only UTF-8 is supported.
> A "proper" solution would probably involve guessing the character set of > a text file and convert it if necessary before indexing. Meh. > How are you dealing with this in codesearch.debian.net? I just assume everything is UTF-8. If it is not, and actually contains non-ASCII characters, it needs to be converted to UTF-8. I mean, common. This is 2013. Just convert it the files already! :-) -- Best regards, Michael -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

