On Tue, Sep 02, 2003 at 06:18:39PM +0200, Josip Rodin wrote: > On Tue, Sep 02, 2003 at 05:48:27AM +0200, Martin Godisch wrote: > > > > Anyway I fail to see which problems arise with this proposal, could > > > > someone enlighten me? > > > > > > It's too broad. Has anyone tested if the packaging system correctly > > > processes double-byte information everywhere? > > > > I had no problems reading mbc descriptions with dpkg and apt so far. Is > > there some special test I should do? > > Your proposal says "the control fields". Description is just one, what about > all the others? (If it was your intent to only do this for descriptions, why > doesn't the proposal say so?)
My understanding of the proposal is that if a field use non-ASCII characters, encoding should be UTF-8. It does not say that all fields can contain non-ASCII characters, which is why current packaging tools does not need to be patched. Currently 165 binary packages contain non-ASCII characters in Maintainer or Description fields; there are 56 binary packages with non-ASCII characters in their description (which means that you are responsible of 9% of this garbage ;)), and 26 maintainer names with such letters (but only 21 unique maintainers). This is an upper limit, maybe some of these strings are already UTF-8 encoded. Denis

