--- Dirk Meyer <[EMAIL PROTECTED]> escreveu: > Gustavo Sverzut Barbieri wrote: > > Hello, > > > > Just commited the necessary changes to support non latin-1 > filenames. > > > > It Works-For-Me (TM) using LANG=pt_BR.UTF-8 and non-ascii > (portuguese) > > chars so it should work with others, please test. > > I just tested the Video, Audio and Image modules, 'cause I don't > have > > games/commands modules, so if you have it and uses non-ascii chars, > > please test. > > I can't test it right now, but I also had the idea to do it. So > you've > done it, great. But I guess we need some testing, because I know > there > are some bad things in the string/unicode world:
Yes, they're... If you check the cvs logs you see my changes are very punctual, but I spent a lot of time making it work... and discovering where it breakes... as you mentioned below, it generally breaks when doing str(), cause python uses ASCII as default encoding. One possible solution is to keep it unicode as far as possible and when we need to make it a string, we should use some testings, like: if type( possible_string ) == unicode: s = possible_string.encode( "utf-8" ) or something else, like "latin-1". I'm for using utf-8 to output since it works everywhere, given you have the font. > The default encoding in Python 2.3 is 'ascii'. You can't change > that. If you have a string (when I say string, I mean str()) which > non > ascii chars (e.g. latin-1), Talking to people in #python I discovered a package called "site" which let you change it. But from what I understand, it changes the whole system, not just the program. > you can't just run unicode(mystring), it > would cause an UnicodeError, because Freevo want's to use the 'ascii' > encoding. The correct way is unicode(mystring, 'latin-1') or in our > case replace latin-1 with config.LOCALE. Now we have config.encoding, which uses (in order) FREEVO_LOCALE, LANG and LC_ALL. It contains the second element in the pair: LANGUAGE.ENCODING. If you want freevo to use UTF-8 (my case), just do: "FREEVO_LOCALE=pt_BR.UTF-8 freevo" > Now we can keep all internal > strings as unicode. But the problem is also the way back. Yes. And I would apreciate if you devs look at my changes and check if I'm converting it to string (.encode(...)) when we could keep it unicode. >Some > functions like os.X want string objects. You can pass string objects > with non ascii characters, no problem, _but_ if you pass unicode > objects with non ascii in it, it will use the default encoding (ascii > again) and will raise an UnicodeError again. If you use os.listdir( u'string' ) it returns (if possible, if not it returns a string. must check. I did it one place, maybe we need to check others too) a list of unicode objects. Others like statvfs doesn't, so you need the .encode( ... ) The major problem I see is with metadata. I saw that ogg uses utf-8 internally, but if others don't it will become a real mess, since the metdata probably come from the internet and then you have no way to guess what encoding to use. > You did the starts changing all internal strings to Unicode, > great. But we should search for the following stuff: > > Every string sthat goes into Freevo must be Unicode. If it comes from > fxd files, this is no problem because the xml parser uses unicode. On > the other hand, we have directory listings, you convert them to > Unicode. I had some bad problems with that, let's see if you > implementation works better than my first draft. Second is the > outgoing. Every function needed a string, should get an Unicode > object > converted with the current locale. I read about unicdoe version od > all > os operations, but I couldn't find them. I always use .encode() > It will need some time to convert all parts of Freevo. This you use > even non Latin-1 (what charset is it?), we should be able to trace > all > the bugs. But I expect Freevo to be unstable because of this for at > least 2 weeks. But it had to be done sooner or later, so good work. My charset could be considered latin-1 (iso8859-1), but contains chars outside ASCII range, and as I'm using utf-8 as my encoding it becomes 2 byte long... when you transform it to ascii it becomes two weird chars... I commited it soon so people could test. I don't use freevo daily and have few non-ascii filenames/metadata, so I can't test it much more. Gustavo ______________________________________________________________________ Yahoo! GeoCities: 15MB de espa�o gr�tis para criar seu web site! http://br.geocities.yahoo.com/ ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Freevo-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/freevo-devel
