Hi John, On Wed, Feb 06, 2008 at 09:20:13PM +0000, John McCreesh wrote: > Christian Lohmaier wrote: > [snip] > >So either your planet is at fault, or whatever editor you use to merge > >the feed into the site is doing "clever" stuff to the data. If you > >suspect cvs doing bad stuff with the file, compare the md5sums of the > >file that you did commit and the file you can get via a checkout. > > I've got it, but I'm still confused. Look at: > > http://website.openoffice.org/tryouts/jpmcc/x.html and > http://website.openoffice.org/tryouts/jpmcc/y.html > > If you View -> Page Source, you will see that they are both > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> > but the characters in x.html are corrupt. > > The difference is in the source I save in cvs. > x.html has: > <meta http-equiv="content-type" content="text/html; charset=us-ascii" /> > y.html has: > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> > > so it looks as though it is *the Collabnet CMS* that is trying to > convert us-ascii into UTF-8 when it renders x.html...
As us-ascii would have been a true subset of UTF-8 (and thus no conversion is needed at all if you trust in the encoding specified), I'm quite surprised that this matters. Should have tried with nonav before... But OTOH, the file isn't us-ascii at all (and thus as expected the aggregator was misconfigured), so probably CN checks whether it is 7bit clean and "escapes" the characters to avoid "attacks" or other problems. (it converts the "out-of-us-ascii" characters to numbered entities, but of course disregards utf-8's encoding scheme) E2 80 99 (UTF8-hex) gets converted to three times the code for unknown character/replacement character. > I have now changed the Planet template to create UTF-8 and all is well. > > Thanks for your help. I love this CMS ;-) :-) You're welcome. actually not sure whether one can blame CEE here - that UTF-8 sequence doesn't correspond to ASCII characters. If it were ASCII, it would correspond to 3 individual characters - so what would be more logical to add 3 "replacement" markers in that case. One surely could argue about that though ;-) ciao Christian -- NP: Korn - Fake --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
