Hello Stian, Stian Soiland-Reyes <[email protected]> schrieb am Mo., 26. Sep. 2016 um 01:45 Uhr:
> As I mentioned in the BeanUtils vote, its RELEASE-NOTES.txt was in > character set ISO-8859-1, instead of say UTF-8 (to represent the name > "Tommy Tynjä"). > > However the RELEASE-NOTES are special in that they go into git/svn and > thus the release zip/tar.gz, but also we copy them into the dist > download area - see for instance > > > http://www.apache.org/dist/commons/collections/RELEASE-NOTES-4.0.txt > > which (if you search for COLLECTIONS-8) should say correctly with > Norwegian O-slash: > > > Thanks to Rune Peter Bjørnstad. > > but instead might (as in my Chromium browser) be shown incorrectly in > "WTF8": > > > Thanks to Rune Peter Bjørnstad. > > > This is because the file (at least from www.apache.org) is served as just: > > Content-Type: text/plain > > e.g. character set ISO 8859-1 (Latin 1). > > > (Different mirrors might have a different AddDefaultCharset set - > http://www.apache.org/info/how-to-mirror.html does not mandate any) > > > I think we should correctly cater for any non-latin1-names in our > release notes - people should be thanked by their real names -- not > everyone wants to legally change their name to an ASCII-compatible > version (says formerly "Stian Søiland"). > > > So I had a look at the immediate files in dist, and found these > non-ASCII text files: > > stain@biggiebuntu:~/src/95/commons$ find . -type f | grep -v .svn | > xargs file | grep -v ASCII > > ./bcel/RELEASE-NOTES.txt: > UTF-8 Unicode text > ./email/RELEASE-NOTES.txt: > UTF-8 Unicode text > ./codec/RELEASE-NOTES.txt: > ISO-8859 text, with CRLF line terminators > ./logging/RELEASE-NOTES.txt: > UTF-8 Unicode text > ./cli/RELEASE-NOTES.txt: > ISO-8859 text > ./beanutils/RELEASE-NOTES.txt: > C++ source, ISO-8859 text > ./collections/RELEASE-NOTES.txt: > UTF-8 Unicode text > ./collections/RELEASE-NOTES-4.0.txt: > UTF-8 Unicode text > ./compress/RELEASE-NOTES.txt: > UTF-8 Unicode text > ./lang/RELEASE-NOTES.txt: > ISO-8859 text > > > I propose we add a default commons/.htaccess which sets something like: > > AddCharset UTF-8 .txt .html > > ..and convert the ISO-8859 ones to UTF-8; (checking manually they are > latin 1 and not any of the other latin variants). We should fix both > in dist and git/svn to avoid regression. > > > As various .htaccess files are already in operation across dist (I > found at least 20, including under httpd), so I think this should be > OK. > > > For the BeanUtils 1.9.3 release I thus added such an .htaccess - then > we can see if that breaks anything on the mirrors. So far so good: > > stain@biggiebuntu:~/src/95$ curl -s -I > http://www.apache.org/dist/commons/beanutils/RELEASE-NOTES.txt | grep > Content-Type > Content-Type: text/plain; charset=utf-8 > > > > Views..? > Thank you for the thorough analysis. I agree with your proposal to add .htaccess to get the charset right. Thank you, Benedikt > > -- > Stian Soiland-Reyes > http://orcid.org/0000-0001-9842-9718 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
