On Fri, Apr 13, 2007 at 14:35:24 +0200, Brice Goglin wrote:
> Tino Keitel wrote:
> > lltag always seems to decode non-ASCII characters as Latin1 characters, even
> > when the user has set a UTF8 locale:
> >   
> 
> I was actually expecting some UTF8 related problems since people
> requesting ID3v2 used the UTF-8 support. But I wanted to get early
> feedback about ID3v2 before dealing with all related issues, so I didn't
> really bother trying to make the UTF-8 support very good.

Well, I don't know much about the handling of different encodings in
ID3v2 tags. I'll try to learn how this is handled. AFAIK this issue is
broken by design in ID3v1, as it doesn't even care in any way about the
encoding of the strings.

> 
> >     Track 06: Mu� ja
> >     Track 07: H�nde hoch, Papa                                         
> >     Track 08: Fin de mill�naire                                        
> >   
> 
> I didn't think about CDDB. Looks like the CDDB site I use returns
> non-UTF8 encoding. I need to keep that in mind.
> 
> > Here are the correct filenames:
> >
> > 06 - Muß ja.mp3
> > 07 - Hände hoch, Papa.mp3
> > 08 - Fin de millénaire.mp3
> >
> > The idv2 tag will also get a Latin1 encoding instead of UTF8. Here is the
> > id3v2 -l output:
> >
> > TALB (Album/Movie/Show title): Economy Class
> > TCON (Content type): Rock (17)
> > TIT2 (Title/songname/content description): Mu� ja
> >   
> 
> I am not sure you can trust id3v2 here. lltag currently does not pass
> any encoding to MP3::Tag, which means the encoding is set to non-UTF8 by
> default. So I don't think there's an inconsistency between the encoding
> that is set and the actual tag string encoding, both are non-UTF8. Then,
> id3v2 might be displaying lltag's non-UTF-8 tags without translating
> them into your UTF-8 locale.

I played around with id3v2 before, when I had problems with Latin1 ID3
tags with the Rhythmbox player, and id3v2 was always correct. Both UTF8
and Latin1 tags were translated correctly to my UTF8 locale. However, I
don't know if this was magic done by ID3v2, or if the encoding is
specified in the tag.

> Anyway, it looks like I should
> * know that CDDB returns non-UTF-8
> * convert its result to UTF-8 if the locale is UTF-8
> * use the converted values for both displaying and tagging

I think tagging with non-UTF-8 is ok, as long as the encoding is
specified in the tag (it this is possible at all)

> Then several questions need to be raised:
> * should I assume that the filename, the tags and the current locale are
> all the same (I mean all UTF-8 or all non-UTF8)?

Filenames and Tags should be the same, but the current locale may be
set to something different.

> * if your current locale is UTF8 while the file contains non-UTF8 tags,
> do I convert them into UTF-8 ?

Hum, I don't really care about filename -> tag translation, more the
the other way round. But I think that the tags should get the same
encoding like the filenames per default. Depending on a system and user
dependent locale setting here is a broken concept IMHO. But other
people might think different about this.

> * do we need options to convert filenames from/to UTF8 when
> reading/renaming and tags from/to UTF8 when reading/tagging? It would
> mean a lot of new command line options...
> * what about other filetype? OGG vorbis seems to use UTF8 by default, I
> should fix lltag then. It might be the same for FLAC. But, I don't know
> what to do with ID3v1.
> 
> So yes, I am going to work on all this, but I need to think a lot first :)

Great, thanks. Meanwhile, I wrote a little script that converts Latin1
encoded tags to UTF8, so I can wait. :-)

Regards,
Tino

Reply via email to