> There might be two different issues. > > 1. The encoding was set to utf-8-default, which means, no suitable encoding > was detected and eric simply chose the default setting. My question is, was > enhanced encoding detection activated on the Editor->Filehandling page of the > config dialog? What is the correct encoding of the file? Would you please send > it. > > 2. The styling may be wrong. Please check, if selecting "Alternative: > Django/Jinja" as the lexer language (via edittor context menu) gives a correct > highlighting. > > Regards, > Detlev > > 1. Correct file encoding is utf-8 (system-wide locale), on Filehandling page also set utf-8, moreover, I've tried to turn off encoding detection with no result. There is my file http://www.mediafire.com/?fdytlqdb51z
2. Because I don't have the Jinja plugin installed, I've tried another lexers, such as HTML/PHP, Python and others - all of them works fine with the same file(s) (I mean there was no pretty django highlighting, but string mangling has gone too). 3. Finally, I'd like you to pay attention to my previous messages - the problem seems to be gone if I use the str() instead of the unicode() function. According to the http://boodebr.org/main/python/all-about-python-and-unicode , Python may return wrong values on len function when you call it for unicode strings.Now I see that str() works properly only due to it encodes unicode to ascii. So it likely won't work with, for example, japanese locale, which has no ascii implementation. Look at short demo I prepared http://img571.imageshack.us/img571/4782/snapshot2g.png (I show only first tag "block", text bellow is unimportant) : 1. Everything works great until I use russain. 2. For example, english works fine. 3. I wrote 1 russian symbol. In this moment lexer highlighted closing tag not fully, note that exactly one symbol highlighted wrong. 4. I wrote second symbol and you can see that the lexer now not highlighted two symbols. 5. Each added russian symbol cause lexer "forget" to highlight one more symbol in closing tag. So, here is my explanation: One russian letter in my case takes two bytes. len(unicode(one_russian_letter)) returns, as expected, 1. But lexer, obviously, assumed that one letter takes one byte - here we get one-byte shift and symbol corruption (did you note that lexer corrupts only odd number of symbols?), so lexer badly interprets length of non-english strings. And if I replace unicode() with str(), strings encodes to one-byte ascii and lexer works fine. So, here comes two conclusions: 1. Eric's lexer subsystyem works with strings as common ascii strings, not unicode. 2. Other lexers (Python, HTML/PHP) works fine, cause they convert strings to ascii. Please, fix me if I'm wrong. P.S.: absolutely the same happens if I don't replace unicode() with str() and add .encode() to it in styleText method, so it looks now like "for token, txt in self.__lexer.get_tokens(unicode(self.editor.text())*.encode()*[:end + 1]):" (w/o quotes) Also excuse me for my not so good english and my manner to write much boring text.
_______________________________________________ Eric mailing list [email protected] http://www.riverbankcomputing.com/mailman/listinfo/eric
