Hi harri,

Nice bit of work this, many thanks.

> Hi,
> 
> I took a look at the unicode handling in freevo, and attached is a small 
> patch 
> to fix a couple of things:
> 
> 1. urllib.quote() only handles strings, not unicode strings

This should not be a problem with file names an directory names as the
urllib.quote should be only against strings. Perhaps should go through
these to be certain :)

> 2. Add sitecustomize.py to set the freevo default encoding to 'utf-8', so 
> that 
> each "".encode() call does not need to specify it.  Note that encoding should 
> not be hard coded all over the place, there are things like:
> 
> search_string = '%s %s' % (artist.encode('latin-1'), album.encode('latin-1'))

IIRC From coversearch, I think the is necessary here because Amazon
coversearch requires iso-8859-1 strings.

> in many places.  The String() and Unicode() helper functions should usually 
> be 
> used instead when necessary.
> 
> 3. I added another fallback to Unicode() helper function to use 'iso-8859-15' 
> if the encoding to unicode fails with the default (utf-8).  This I did to 
> handle the filenames.  Problem with filesystems is that those are not usually 
> unicode aware.  That means the user's locale defines how the filenames are 
> encoded.  So if my locale is iso-8859-15, a name like "tämä" will have 
> different bytes on disk than if my locale is "utf-8".  This happens probably 
> most often when moving files between machines having different locales, but 
> you can simulate the effect with something like:
> 
> os.mkdir("Andr\xe9".decode("latin-1").encode("latin-1"))
> os.mkdir("Andr\xe9".decode("latin-1").encode("utf-8"))
> 
> That will give you two directories "André", but with different encodings.
> 
> 
> Dealing with unicode and different encodings can be sometimes confusing.
> Personally I find it helpful to think of it as follows:
> 
> There is usually a pair, unicode string and raw string. The unicode string 
> includes metadata, it knows about its encoding.  The raw python string is 
> just a bytestream.  The convention is that the bytestream contains ascii, but 
> it can contain anything.
> 
> So unicode("abc") will take the bytestream "abc" and turn it to a unicode 
> string, and all is well as the default encoding is ascii.
> 
> Now unicode ("äläpäs") will fail, unless you have changed the default 
> encoding.  It is equivalent to "äläpäs".decode().  But the string is not pure 
> ascii, and thus it bails out with something like "UnicodeDecodeError: 'ascii' 
> codec can't decode byte ..."
> 
> So if you have a raw python string containing anything more exotic than 
> ascii, 
> and you want to convert it to unicode, you must explicitly tell the encoding 
> of the string.  You can also change the default encoding from ascii to 
> something else, but only in site.py or sitecustomize.py

I see in the patch, thanks again for this,
sys.setdefaultencoding('utf-8'), does sitecustomize.py get called before
site.py? The reason I ask is sys.setdefaultencoding() gets deleted by
site.py.

> Another curve ball is the user locale, what you type in the terminal can look 
> the same to you, but without checking it is impossible to say if the 
> representation on disk will be the same.  For instance, Mandriva 2007 seems 
> to default to utf-8 encoding, and that will result in filenames with accents 
> being different.  Furthermore, if you write something using utf-8 in kwrite, 
> and give the resulting file to your friend who is using latin-1, the contents 
> will not render properly -- he will not see your accents before changing his 
> encoding.
> 
> 
> Hope this helps,

A very nice explanation, it helps clear up a few of those 'I sort of get
it but don't quite' questions.

Duncan


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freevo-users

Reply via email to