Hi! Perhaps core-libs-dev is the more appropriate mailing list, but there are hundreds of posts a month there. I'm not sure whether the severity of the problem would require a more fundamental solution. So I will reply here as well, because I see that Brian Goetz reviewed and endorsed the JEP-400 and I would like to see him endorsing a rescue action as well to keep my trust in the Java platform. If there is a higher priority than CRITICAL, then this is the case. If left as it is, this JEP will completely ruin Java as a starting language because no simple beginner's example from the web will work anymore and any new will look extremely complicated. Reinier described it very realistically.
Just a side note for the java.io.Console: This class does not work in IDEs. System.console() returns null in IDEs. It only works when Java is invoked from the native OS console. Therefore it is very much useless. — Kamil Sevecek On Thu, 13 Oct 2022 at 19:07, Ron Pressler <ron.press...@oracle.com> wrote: > Hi. > > The appropriate list is core-libs-dev, where this discussion should > continue. > > System.in is the standard input, which may or may not be the keyboard. For > keyboard input, take a look at the java.io.Console class [1], in particular > its charset and reader methods. > > [1]: > https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/io/Console.html > > — Ron > > On 13 Oct 2022, at 16:20, Reinier Zwitserloot <rein...@zwitserloot.com> > wrote: > > PREAMBLE: I’m not entirely certain amber-dev is the appropriate venue. If > not, where should this be discussed? It’s not quite a bug but nearly so, > and not quite a simple feature request either. > > JDK18 brought JEP400 which changes the default charset encoding to UTF-8. > This, probably out of necessity, goes quite far, in that > Charset.defaultCharset() is now more or less a constant - always returns > UTF_8. It’s now quite difficult to retrieve the OS-configured encoding (the > ’native’ encoding). > > However, that does mean one of the most common lines in all of java’s > history, is now necessarily buggy: new Scanner(System.in) is now broken. > Always, unless your docs specifically state that you must feed the app > UTF_8 data. Linting tools ought to flag it down as incorrect. It’s > incorrect In a nasty way too: Initially it seems to work fine, but if > you’re on an OS whose native encoding isn’t UTF-8, this is subtly broken; > enter non-ASCII characters on the command line and the app doesn’t handle > them appropriately. A bug that is literally utterly undiscoverable on macs > and most linux computers, even. How can you figure out your code is broken > if all the machines you test it on use UTF-8 as an OS default? > > This affects beginning java programmers particularly (who tend to be > writing some command line-interactive apps at first). In light of Brian > Goetz’s post “Paving the Onramp” ( > https://openjdk.org/projects/amber/design-notes/on-ramp) - the experience > for new users is evidently of some importance to the OpenJDK team. In light > of that, the current state of writing command line interactive java apps is > inconsistent with that goal. > > The right way to read system input in a way that works in both pre- and > post-JEP400 JVM editions appears to be, as far as I can tell: > > Charset nativeCharset = Charset.forName(System.getProperty("native.encoding", > Charset.defaultEncoding().name()); > Scanner sc = new Scanner(System.in, nativeCharset); > > > I’ll risk the hyperbole: That’s.. atrocious. Hopefully I’m missing > something! > > Breaking _thousands_ of blogs, tutorials, stack overflow answers, and > books in the process, everything that contains new Scanner(System.in). > Even sysin interaction that doesn’t use scanner is likely broken; the > general strategy then becomes: > > new InputStreamReader(System.in); > > > which suffers from the same problem. > > I see a few directions for trying to address this; I’m not quite sure > which way would be most appropriate: > > > - Completely re-work keyboard input, in light of *Paving the on-ramp*. > Scanner has always been a problematic API if used for keyboard input, in > that the default delimiter isn’t convenient. I think the single most common > beginner java stackoverflow question is the bizarre interaction between > scanner’s nextLine() and scanner’s next(), and to make matters > considerably worse, the proper fix (which is to call > .useDelimiter(“\\R”) on the scanner first) is said in less than 1% of > answers; the vast majority of tutorials and answers tell you to call > .nextLine() after every .nextX() call. A suboptimal suggestion (it now > means using space to delimit your input is broken). Scanner is now also > quite inconsistent: The constructor goes for ‘internet standard’, using > UTF-8 as a default even if the OS does not, but the locale *does* go > by platform default, which affects double parsing amongst other things: > scanner.nextDouble() will require you to use commas as fractions > separator if your OS is configured to use the Dutch locale, for example. > It’s weird that scanner neither fully follows common platform-independent > expectations (english locale, UTF-8), nor local-platform expectation > (OS-configured locale and OS-configured charset). One way out is to make a > new API for ‘command line apps’ and take into account Paving the on-ramp’s > plans when designing it. > - Rewrite specifically the new Scanner(InputStream) constructor as > defaulting to native encoding even when everything else in java defaults to > UTF-8 now, because that constructor is 99% used for System.in. Scanner > has its own File-based constructor, so new > Scanner(Files.newInputStream(..)) is quite rare. > - Define that constructor to act as follows: the charset used is the > platform default (i.e., from JDK18 and up, UTF-8), *unless* arg == > System.in is true, in which case the scanner uses native encoding. > This is a bit bizarre to write in the spec but does the right thing in the > most circumstances and unbreaks thousands of tutorials, blogs, and answer > sites, and is most convenient to code against. That’s usually the case with > voodoo magic (because this surely risks being ’too magical’): It’s > convenient and does the right thing almost always, at the risk of being > hard to fathom and producing convoluted spec documentation. > - Attach the problem that what’s really broken isn’t so much scanner, > it’s System.in itself: byte based, of course, but now that all java > methods default to UTF-8, almost all interactions with it (given that most > System.in interaction is char-based, not byte-based) are now also > broken. Create a second field or method in System that gives you a > Reader instead of an InputStream, with the OS-native encoding applied > to make it. This still leaves those thousands of tutorials broken, but at > least the proper code is now simply new Scanner(System.charIn()) or > whatnot, instead of the atrocious snippet above. > - Even less impactful, make a new method in Charset to get the native > encoding without having to delve into System.getProperty(). > Charset.nativeEncoding() seems like a method that should exist. > Unfortunately this would be of no help to create code that works pre- and > post-JEP400, but in time, having code that only works post-JEP400 is fine, > I assume. > - Create a new concept ‘represents a stream that would use platform > native encoding if characters are read/written to it’, have System.in > return true for this, and have filterstreams like BufferedInputStream just > pass the call through, then redefine relevant APIs such as Scanner and > PrintStream (e.g. anything that internalises conversion from bytes to > characters) to pick charset encoding (native vs UTF8) based on that > property. This is a more robust take on ‘new Scanner(System.in) should > do the right thing'. Possibly the in/out/err streams that Process gives > you should also have this flag set. > > > > If it was up to me, I think a multitude of steps are warranted, each > relatively simple. > > > - Create Charset.nativeEncoding(). Which simply returns > Charset.forName(System.getProperty(“native.encoding”). But with the > advantage that its shorter, doesn’t require knowing a magic string, and > will fail at compile time if compiled against versions that predate the > existence of the native.encoding property, instead of NPEs at runtime. > - Create System.charIn(). Which just returns an InputStreamReader > wrapped around System.in, but with native encoding applied. > - Put the job of how java apps do basic command line stuff on the > agenda as a thing that should probably be addressed in the next 5 years or > so, maybe after the steps laid out in Paving the on-ramp are more fleshed > out. > - In order to avoid problems, *before* the next LTS goes out, re-spec new > Scanner(System.in) to default to native encoding, specifically when > the passed inputstream is identical to System.in. Don’t bother with > trying to introduce an abstracted ‘prefers native encoding’ flag system. > > > --Reinier Zwitserloot > > >