As promised, I prepared some text about locale related (i.e.: not only UTF-8) problems with MP3 players. Sound is always OK, don't worry about it :)

Many MP3 files include metadata containing the song title, album and artist name. Such information is located in a so-called ID3 tag. There are two types of ID3 tags: ID3v1 (not formally standartized) and ID3v2.x (see the formal standard at http://www.id3.org/).

Audio players that show ID3 tags to the user must interpret bytes that form the tags as characters. Results depend upon the character encoding used for such interpretation. E.g., the sequence of bytes 0xC3 0xA5 means the character å in the UTF-8 encoding, the characters Ã¥ in ISO-8859-1, ĂĽ in ISO-8859-2 and so on. For the user to be able to read the strings in the tags, the application that originally created the tag and the program used for its display must agree upon the same encoding.

(yes, I know that the ISO-8859-2 example creates XML problems for the book. In HTML, the preferred form for all strange characters mentioned above would be å Ã¥ Ă Ľ. The first three characters can be also inserted directly. If ISO-8859-2 characters create problems with PDF, remove the ISO-8859-2 example)

The de-facto standard for encoding used in ID3v1 tags is the character encoding used by MS Windows in the relevant country, because there is no formal standard, and because WinAMP, one of the most popular MP3 players, cannot display anything else. This de-facto standard is also honoured by some hardware MP3 players, e.g., HanBiT XDRUM XD-405. Most of MP3 players for Linux, however, assume the current locale character set by default.

In ISO-8859-1 based locales, this assumption is harmless because the CP1252 code page, used by Windows in those countries, is a superset of ISO-8859-1 (i.e., for every byte for which ISO-8859-1 assigns a character, CP1252 assigns the same character). In other countries, where Windows and Linux use very different character sets (e.g., in Poland, where Windows uses CP1250 and Linux uses ISO-8859-2), or on Linux systems that use UTF-8 locales, this assumption leads to incorrect results.

Kaffeine (http://kaffeine.sourceforge.net/), BEEP Media Player (http://www.sosdg.org/~larne/w/BMP_Homepage, discontinued, maybe shouldn't be mentioned at all), and Audacious Media Player (http://audacious.nenolod.net/Main_Page) allow the character encoding of the ID3 tags to be configured by the user and thus can display ID3v1 tags correctly, according to the de-facto standard. A patch exists for XMMS that provides the same functionality, see http://rusxmms.sourceforge.net/. Windows-based players may also work under WINE.

For ID3v2.x and OGG tags, the problem described above usually doesn't exist (but some MP3 files with broken ID3V2 tags can be found on the Internet). In ID3v2.x tags, the used character encoding is specified in the tag itself, and the specification for OGG tags allows only UTF-8. Players usually follow the specifications, convert encodings as necessary and display the text correctly.

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Reply via email to