Hi harri,
Nice bit of work this, many thanks.
> Hi,
>
> I took a look at the unicode handling in freevo, and attached is a small
> patch
> to fix a couple of things:
>
> 1. urllib.quote() only handles strings, not unicode strings
This should not be a problem with file names an directory names as the
urllib.quote should be only against strings. Perhaps should go through
these to be certain :)
> 2. Add sitecustomize.py to set the freevo default encoding to 'utf-8', so
> that
> each "".encode() call does not need to specify it. Note that encoding should
> not be hard coded all over the place, there are things like:
>
> search_string = '%s %s' % (artist.encode('latin-1'), album.encode('latin-1'))
IIRC From coversearch, I think the is necessary here because Amazon
coversearch requires iso-8859-1 strings.
> in many places. The String() and Unicode() helper functions should usually
> be
> used instead when necessary.
>
> 3. I added another fallback to Unicode() helper function to use 'iso-8859-15'
> if the encoding to unicode fails with the default (utf-8). This I did to
> handle the filenames. Problem with filesystems is that those are not usually
> unicode aware. That means the user's locale defines how the filenames are
> encoded. So if my locale is iso-8859-15, a name like "tämä" will have
> different bytes on disk than if my locale is "utf-8". This happens probably
> most often when moving files between machines having different locales, but
> you can simulate the effect with something like:
>
> os.mkdir("Andr\xe9".decode("latin-1").encode("latin-1"))
> os.mkdir("Andr\xe9".decode("latin-1").encode("utf-8"))
>
> That will give you two directories "André", but with different encodings.
>
>
> Dealing with unicode and different encodings can be sometimes confusing.
> Personally I find it helpful to think of it as follows:
>
> There is usually a pair, unicode string and raw string. The unicode string
> includes metadata, it knows about its encoding. The raw python string is
> just a bytestream. The convention is that the bytestream contains ascii, but
> it can contain anything.
>
> So unicode("abc") will take the bytestream "abc" and turn it to a unicode
> string, and all is well as the default encoding is ascii.
>
> Now unicode ("äläpäs") will fail, unless you have changed the default
> encoding. It is equivalent to "äläpäs".decode(). But the string is not pure
> ascii, and thus it bails out with something like "UnicodeDecodeError: 'ascii'
> codec can't decode byte ..."
>
> So if you have a raw python string containing anything more exotic than
> ascii,
> and you want to convert it to unicode, you must explicitly tell the encoding
> of the string. You can also change the default encoding from ascii to
> something else, but only in site.py or sitecustomize.py
I see in the patch, thanks again for this,
sys.setdefaultencoding('utf-8'), does sitecustomize.py get called before
site.py? The reason I ask is sys.setdefaultencoding() gets deleted by
site.py.
> Another curve ball is the user locale, what you type in the terminal can look
> the same to you, but without checking it is impossible to say if the
> representation on disk will be the same. For instance, Mandriva 2007 seems
> to default to utf-8 encoding, and that will result in filenames with accents
> being different. Furthermore, if you write something using utf-8 in kwrite,
> and give the resulting file to your friend who is using latin-1, the contents
> will not render properly -- he will not see your accents before changing his
> encoding.
>
>
> Hope this helps,
A very nice explanation, it helps clear up a few of those 'I sort of get
it but don't quite' questions.
Duncan
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freevo-users