Hi, I worked on UTF-8 support and stumbled across the following issue: In iso9660_fs.c I replaced the function ucs2be_to_locale() by a new and more generic cdio_charset_to_utf8().
While trying to figure out, why my routine fails, I found, that the original version also fails. Digging deeper into it, it seems that there is a +/- 1 byte offset when reading several strings. When called by iso9660_ifs_get_application_id(), ucs2be_to_locale() gets the following data: 20 00 4b 00 33 00 42 00 20 00 54 00 48 00 45 00 .K.3.B. .T.H.E. 20 00 43 00 44 00 20 00 4b 00 52 00 45 00 41 00 .C.D. .K.R.E.A. 54 00 4f 00 52 00 20 00 28 00 43 00 29 00 20 00 T.O.R. .(.C.). . 31 00 39 00 39 00 38 00 2d 00 32 00 30 00 30 00 1.9.9.8.-.2.0.0. 35 00 20 00 53 00 45 00 42 00 41 00 53 00 54 00 5. .S.E.B.A.S.T. 49 00 41 00 4e 00 20 00 54 00 52 00 55 00 45 00 I.A.N. .T.R.U.E. 47 00 20 00 41 00 4e 00 44 00 20 00 54 00 48 00 G. .A.N.D. .T.H. 45 00 20 00 4b 00 33 00 42 00 20 00 54 00 45 00 E. .K.3.B. .T.E. Now my knowlegde about iso9660 is near zero, but I know for sure, that the above sequence is no UCS-2BE. In Big Endian, a space ' ' will be 0x00 0x20 instead of 0x20 0x00. Same issues seem to be in the functions: iso9660_ifs_get_preparer_id(); iso9660_ifs_get_publisher_id(); iso9660_ifs_get_volumeset_id(); The following functions work here: iso9660_ifs_get_system_id(); iso9660_ifs_get_volume_id(); In iso-info, these bugs don't show up because the respective strings are either not shown or they come from "somewhere else". Can anyone help here? My UTF-8 patch is practically finished but I would like to get these issues resolved. Thanks Burkhard _______________________________________________ Libcdio-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/libcdio-devel
