I'm working right now on a legacy web project that uses a bunch of System.out.println for logging and debugging. It was plagued by a lot of encoding problems, and I fixed most of them by forcing everything to be UTF-8. There is no sense in using the platform default encoding when the text output is something to be embedded into HTML and juggling/converting strings of different encodings around is what I considers a form of torture. Also, reading text/config/html/java/whatever files written in different encodings and having to guess what is the correct encoding on a case-by-case basis just makes it a still worse torture.
Also, I done the translation of Checkstyle to Portuguese and the translation files are encoded in UTF-8. When using them in my project, Checkstyle prints localized messages of code-style violations, and those became garbled in the netbeans console. Don't know if this is a Checkstyle bug, but even if it is, having to do encoding checks and conversions just to println a String for debugging purposes is a burden that no programmer should deserve to have. Also, even if it is a Checkstyle bug and someone gets to fix it, there would be probably more millions of tools out there with the same bug. Ok, Netbeans can't do anything to fix a bunch of tools that have encoding problems. However this show that the hole here is much bigger: the simple existence of the concept of a default platform encoding is the root cause of all those problems. Even a simple System.out.println statement may suffer from this problem because you don't know and can't know (and ideally shouldn't need to know or care) if the String it is going to the console, to a file, to a socket or to anywhere else. It only works safely when all the produced strings born, live and die within the same machine, an assumption that is and always was simply plain false and wrong. This is why I strong support the idea of deprecating any methods that rely on the default platform encoding. Netbeans could also do its part by not ever relying on that. Victor Williams Stafusa da Silva 2018-04-21 3:53 GMT-03:00 Tim Boudreau <[email protected]>: > No argument that the situation doesn't need a fix. > > But you didn't answer my question: *What* are you running when the problem > shows up? Your own Java project? If so, Ant, Maven or something else > (i.e. build system where this is settable/detectable or not?)? Or some > application server or third party thing? > > What I'm trying to nail down is, what is the point of minimal intervention > where this could either be detected or made settable. External processes > in Java have binary output; the IDE decides what character set to impose > over that. That decision can be improved, but not without knowing what the > stream is coming from there's no place to start. Without knowing what it > is you're looking at the output of when you see this problem, there's no > progress to be made. > > I'm all for UTF-8 everywhere in theory, and on my own systems, but > defaulting to that is likely to break things for at least as many people as > it helps. So, in the interest of solving it with a scalpel instead of a > sledgehammer, could you give a little detail on where the problematic > output is coming from and how it is generated? > > Thanks, > > -Tim > > On Fri, Apr 20, 2018 at 8:33 PM, Victor Williams Stafusa da Silva < > [email protected]> wrote: > > > In my case, I'm running in windows, with the dreaded and hated > Windows-1252 > > default encoding. > > > > Using default OS encoding is really bad for portability and causes a lot > of > > encoding problems. See this JEP draft maybe for Java 11: > > http://openjdk.java.net/jeps/8187041 - There are three proposed > > alternatives: 1) Keep the status quo; 2) Deprecate all the methods that > > uses the platform default encoding; 3) Force UTF-8 deing the default > > regardless of anything. > > > > As a speaker of Portuguese, a language that is full of diacritics, I'm > > already very sick of years and years of being haunted by encoding > problems > > in buggy software. But it could be much worse if my language was Chinese > or > > Japanese. > > > > Since option 1 is unacceptable and 3 is too drastic and dangerous due to > > backwards-compatibility concerns, I think that this JEP, if it eventually > > gets delivered, will go to option 2. > > > > Anyway, regardless of this JEP or its future, Netbeans should either get > > the correct encoding in the console window or at least provide an easy > and > > accessible way to et the user define it. > > > > Victor Williams Stafusa da Silva > > > > > > 2018-04-20 20:00 GMT-03:00 Tim Boudreau <[email protected]>: > > > > > Your problem is most likely your operating system's default file > encoding > > > here (perhaps MacRoman?). The IDE is assuming that process output is > > > whatever your operating system's default encoding is, which is the > right > > > assumption, since that *is* what command-line utilities will output. > It > > > happens that the process you're running is outputting UTF-8 *rather > than* > > > the > > > OS's default encoding. > > > > > > Setting that as a default would be assuming that every operating system > > > uses UTF-8 regardless of what it does - it would be wrong a lot of the > > > time. It just happens to solve the case that whatever you're running > is > > > outputting UTF-8 in spite of what the operating system provides. > > > > > > That's not that uncommon, but the right solution is to *detect* that > the > > > output is UTF-8 when the IDE runs whatever it is you're running. > > > > > > So... what are you running? Is this project output? If so, what kind > of > > > project? Or server output of some kind? A correct fix would be to (if > > > possible), detect what that is and that it will output UTF-8, and have > > the > > > IDE open the output of that process with the right encoding. > > > > > > -Tim > > > > > > On Fri, Apr 20, 2018 at 6:18 PM, Victor Williams Stafusa da Silva < > > > [email protected]> wrote: > > > > > > > I frequently had some long-standing problems with the console output > > > > encoding in Netbeans. Which always presented garbled non-ascii > > characters > > > > for me. > > > > > > > > After deciding that it was enough, I went to search for a solution > and > > > did > > > > found a very simple one in StackOverflow. Just add > > > -J-Dfile.encoding=UTF-8 > > > > into the netbeans_default_options line of netbeans.conf file and > voilĂ , > > > it > > > > works! > > > > > > > > However, this make me think about it: > > > > > > > > 1. Is there a reason to not add it there by default? > > > > > > > > 2. If it can't be added there by default for some reason, can it at > > least > > > > be something more user-friendly and less arcane to be configured by > the > > > > normal user? > > > > > > > > Victor Williams Stafusa da Silva > > > > > > > > > > > > > > > > -- > > > http://timboudreau.com > > > > > > > > > -- > http://timboudreau.com >
