--- Dirk Meyer <[EMAIL PROTECTED]> escreveu: 
> Gustavo Sverzut Barbieri wrote:
> > Hello,
> >
> > Just commited the necessary changes to support non latin-1
> filenames. 
> >
> > It Works-For-Me (TM) using LANG=pt_BR.UTF-8 and non-ascii
> (portuguese)
> > chars so it should work with others, please test. 
> > I just tested the Video, Audio and Image modules, 'cause I don't
> have
> > games/commands modules, so if you have it and uses non-ascii chars,
> > please test.
> 
> I can't test it right now, but I also had the idea to do it. So
> you've
> done it, great. But I guess we need some testing, because I know
> there
> are some bad things in the string/unicode world:

Yes, they're... If you check the cvs logs you see my changes are very
punctual, but I spent a lot of time making it work... and discovering
where it breakes... as you mentioned below, it generally breaks when
doing str(), cause python uses ASCII as default encoding. One possible
solution is to keep it unicode as far as possible and when we need to
make it a string, we should use some testings, like:

if type( possible_string ) == unicode:
   s = possible_string.encode( "utf-8" )

or something else, like "latin-1". I'm for using utf-8 to output since
it works everywhere, given you have the font.

 
> The default encoding in Python 2.3 is 'ascii'. You can't change
> that. If you have a string (when I say string, I mean str()) which
> non
> ascii chars (e.g. latin-1),

Talking to people in #python I discovered a package called "site" which
let you change it. But from what I understand, it changes the whole
system, not just the program.


> you can't just run unicode(mystring), it
> would cause an UnicodeError, because Freevo want's to use the 'ascii'
> encoding. The correct way is unicode(mystring, 'latin-1') or in our
> case replace latin-1 with config.LOCALE.

Now we have config.encoding, which uses (in order) FREEVO_LOCALE, LANG
and LC_ALL. It contains the second element in the pair:
LANGUAGE.ENCODING. If you want freevo to use UTF-8 (my case), just do:
"FREEVO_LOCALE=pt_BR.UTF-8 freevo"

> Now we can keep all internal
> strings as unicode. But the problem is also the way back. 

Yes. And I would apreciate if you devs look at my changes and check if
I'm converting it to string (.encode(...)) when we could keep it
unicode.


>Some
> functions like os.X want string objects. You can pass string objects
> with non ascii characters, no problem, _but_ if you pass unicode
> objects with non ascii in it, it will use the default encoding (ascii
> again) and will raise an UnicodeError again.

If you use os.listdir( u'string' ) it returns (if possible, if not it
returns a string. must check. I did it one place, maybe we need to
check others too) a list of unicode objects.

Others like statvfs doesn't, so you need the .encode( ... )

The major problem I see is with metadata. I saw that ogg uses utf-8
internally, but if others don't it will become a real mess, since the
metdata probably come from the internet and then you have no way to
guess what encoding to use.

> You did the starts changing all internal strings to Unicode,
> great. But we should search for the following stuff:
> 
> Every string sthat goes into Freevo must be Unicode. If it comes from
> fxd files, this is no problem because the xml parser uses unicode. On
> the other hand, we have directory listings, you convert them to
> Unicode. I had some bad problems with that, let's see if you
> implementation works better than my first draft. Second is the
> outgoing. Every function needed a string, should get an Unicode
> object
> converted with the current locale. I read about unicdoe version od
> all
> os operations, but I couldn't find them.

I always use .encode()

> It will need some time to convert all parts of Freevo. This you use
> even non Latin-1 (what charset is it?), we should be able to trace
> all
> the bugs. But I expect Freevo to be unstable because of this for at
> least 2 weeks. But it had to be done sooner or later, so good work.

My charset could be considered latin-1 (iso8859-1), but contains chars
outside ASCII range, and as I'm using utf-8 as my encoding it becomes 2
byte long... when you transform it to ascii it becomes two weird
chars...

I commited it soon so people could test. I don't use freevo daily and
have few non-ascii filenames/metadata, so I can't test it much more.

Gustavo

______________________________________________________________________

Yahoo! GeoCities: 15MB de espa�o gr�tis para criar seu web site!
http://br.geocities.yahoo.com/


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Freevo-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/freevo-devel

Reply via email to