As promised, I prepared some text about locale related (i.e.: not only
UTF-8) problems with MP3 players. Sound is always OK, don't worry about
it :)
Many MP3 files include metadata containing the song title, album and
artist name. Such information is located in a so-called ID3 tag. There
are two types of ID3 tags: ID3v1 (not formally standartized) and ID3v2.x
(see the formal standard at http://www.id3.org/).
Audio players that show ID3 tags to the user must interpret bytes that
form the tags as characters. Results depend upon the character encoding
used for such interpretation. E.g., the sequence of bytes 0xC3 0xA5
means the character å in the UTF-8 encoding, the characters Ã¥ in
ISO-8859-1, ĂĽ in ISO-8859-2 and so on. For the user to be able to read
the strings in the tags, the application that originally created the tag
and the program used for its display must agree upon the same encoding.
(yes, I know that the ISO-8859-2 example creates XML problems for the
book. In HTML, the preferred form for all strange characters mentioned
above would be å Ã¥ Ă Ľ. The first
three characters can be also inserted directly. If ISO-8859-2 characters
create problems with PDF, remove the ISO-8859-2 example)
The de-facto standard for encoding used in ID3v1 tags is the character
encoding used by MS Windows in the relevant country, because there is no
formal standard, and because WinAMP, one of the most popular MP3
players, cannot display anything else. This de-facto standard is also
honoured by some hardware MP3 players, e.g., HanBiT XDRUM XD-405. Most
of MP3 players for Linux, however, assume the current locale character
set by default.
In ISO-8859-1 based locales, this assumption is harmless because the
CP1252 code page, used by Windows in those countries, is a superset of
ISO-8859-1 (i.e., for every byte for which ISO-8859-1 assigns a
character, CP1252 assigns the same character). In other countries, where
Windows and Linux use very different character sets (e.g., in Poland,
where Windows uses CP1250 and Linux uses ISO-8859-2), or on Linux
systems that use UTF-8 locales, this assumption leads to incorrect results.
Kaffeine (http://kaffeine.sourceforge.net/), BEEP Media Player
(http://www.sosdg.org/~larne/w/BMP_Homepage, discontinued, maybe
shouldn't be mentioned at all), and Audacious Media Player
(http://audacious.nenolod.net/Main_Page) allow the character encoding of
the ID3 tags to be configured by the user and thus can display ID3v1
tags correctly, according to the de-facto standard. A patch exists for
XMMS that provides the same functionality, see
http://rusxmms.sourceforge.net/. Windows-based players may also work
under WINE.
For ID3v2.x and OGG tags, the problem described above usually doesn't
exist (but some MP3 files with broken ID3V2 tags can be found on the
Internet). In ID3v2.x tags, the used character encoding is specified in
the tag itself, and the specification for OGG tags allows only UTF-8.
Players usually follow the specifications, convert encodings as
necessary and display the text correctly.
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page