On Wed, Mar 16, 2005 at 04:55:06PM -0800, Don Armstrong wrote: > On Wed, 16 Mar 2005, Colin Watson wrote: > > I realise it's a database format change, but I'd really prefer to > > have the metadata files be pure UTF-8, so that we don't have to > > process them for display every time, and to make things like > > searching easier. We can always write a migration script. > > I think that's the optimal solution too. However, this patch at least > will work now, and we can move to pure UTF-8 later.
I've taken the approach of creating a new .summary format version; the way the .summary file format works means that we can have "Format-Version: 2" indicate RFC1522 metadata and "Format-Version: 3" indicate UTF-8 metadata. I haven't yet made format version 3 the default, but I will do in time. This made the code a lot simpler, because metadata only needs to be decoded/encoded in the two functions responsible for reading/writing .summary files. I've checked this into CVS, along with some of the uses of decode_rfc1522() from your patch and the changes to make bugreport.cgi and pkgreport.cgi output UTF-8, and installed it on bugs.debian.org. This means that at least maintainer and submitter addresses are now displayed properly. The .log metadata and mail character set fixes still need more work; I'm almost inclined to introduce a new more structured record type to replace html at the same time, and make that be encoded in UTF-8. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

