Joyride-277 doesn't validate, because it contains a file from the library with a filename in non-normalized unicode. The file is named 'Annobo?n_Bioko-thumb.jpg', where the ? should be a separated accent on the o, but it is actually stored on the filename with a combined 'o+accent' glyph.
Now, at first blush this is a bug in the (fast) contents verifier, which I will fix: all strings should be unicode-normalized before they are compared. But it seems like this raises issues with (for example) URLs to library content. Should we enforce the constraint that all filenames are unicode-normalized on disk, so that we can guarantee that a (unicode-normalized) URL will always resolve correctly? Otherwise we run the risk of someone editing a file and resaving it with a name which *appears* identical, but is actually encoded differently on disk, and having URLs to the file mysteriously break. For the technically-minded, we're talking about using the UTF-8 encoding of Unicode Normalization Form D, as discussed (briefly) at http://wiki.laptop.org/go/Canonical_JSON. The problem has arisen because the old libraries used normalized filenames, but we've switched to installing the libraries from RPMs, and apparently non-normalized filenames have snuck in. If I were to hazard a guess, I'd say that the tar command normalizes filenames as they are archived, while RPM does not. My proposal is to ensure that all filenames in the base system (at least) are in normalization form D. I will write a checker in the build process to ensure this, and we should probably eventually write checkers for the activity/library bundle tools that will do the same. --scott -- ( http://cscott.net/ ) _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
