On 11/02/2020 14:26, Christopher Schultz wrote: <snip/>
>> The thing that bugged me was having to manually switch properties >> files to UTF-8 to view them "properly". You mail motivated me to >> track down where I can change that in Eclipse: > >> Window->Preferences->General->Content Types > >> and I have changed Java properties files to use UTF-8. So that is >> my personal niggle fixed. Thanks for the motivation. > > Yes, this *will* fix things, but: > > 1. It's a global setting, so it can't be set on a per-project basis. > That means you have to be willing to convert ALL your properties files > across ALL your projects to UTF-8. That may be okay for some people, > but not all. Fair point. > 2. This is a guess: Tomcat's ide-eclipse ant target can't set that > setting for the Tomcat project(s) because it's a global setting. > Therefore, anyone using Eclipse as an IDE will have to manually set > their content-type in order to NOT damage any of the files we ship. I'm not sure about actual damage. I've see Eclipse manipulate UTF-8 files while configured to use ISO-8859-1 without issue. But maybe that is actually git doing UTF-8 manipulation. >> I was concerned that adding a BOM would cause problems when >> reading property files. I've seen reports of that with Java in the >> past. A quick test suggests that the issue is no longer present >> with latest Java 8. > > I actually had another problem after I implemented all of this: any > property file without a blank and/or comment line at the top ended up > with a mangled and unusable *first* property key. A file like this: > > first.property=foo > second.property=bar > > Would end up line this after a trip through "native2ascii -encoding > UTF-8": > > \ufefffirst.property=foo > second.property=bar That is similar to the problems I recall with earlier versions of Java. > native2ascii stupidly interprets the UTF-8 BOM as an actual character, > and encodes it in the output. > > This appears to be a bug in (at least old versions of) Java and/or > native2ascii. I've got local installations of Java 8, 11 (Adopt), 11 > (Oracle), and 13 (OpenJDK), and only Java 8 has a "native2ascii" > binary present. I see ant's <native2ascii> task has its own > implementation, but it's probably very simple, just like the > native2ascii program itself. Java's Reader classes incorrectly > interpret the BOM as an actual character instead of an ignorable UTF-8 > control sequence. But the chances of us being able to "fix" the Ant implementation are considerably higher :). > Ensuring that the first line of the file is a comment or a blank line > fixes things: > > # BOM > first.property=foo > second.property=bar > > becomes: > > \ufeff# BOM > first.property=foo > second.property=bar Does the BOM end up creating an additional property in this case? >> Overall, I guess I am -0 on adding BOMs. > > Okay. This is a fairly recent change to Tomcat, and frankly, we (a) > don't get a huge number of outside contributions which include changes > to the localized properties files (except for the translation-only > contributions, which have been great!) and (b) often ignore the > non-English translations in the first place because we are lazy. > > I think maybe this can stay on the back-burner until we see if we end > up with any problems. Sounds reasonable to me. It looks like we have options if we need them but with a few minor issues to research / iron out first if we go that way. > Does/can "checkstyle" check for valid UTF-8 byte sequences in > .properties files? I think that may be a helpful check to add if it's > not already in there. Don't know. +1 if such a thing exists. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org