Re: Insufficiencies in JEP: 400: UTF-8 by Default

Roger Riggs Wed, 31 Mar 2021 10:04:23 -0700

Hi Anthony,

A draft of updates to the Process API is in the works and covers improving

the ease of use and providing Readers and Writers. Note that if processoutputis redirected to a file, it does not interpose on the byte streams andis not in

a position to affect the character set used by the child process.


Regards, Roger


On 3/30/21 1:03 PM, Anthony Vanelverdinghe wrote:

Hi Alan

As Marco mentioned, another use case is sub-process stdin/stdout/stderr. In my 
particular instance, I'm starting a Process which has its output redirected to 
a file. It uses the platform's default encoding for writing to stdout. So when 
I want to read its output from the file at some later point, I need to supply 
that encoding to the Files API.
One way to accommodate this use case, is a method which allows to retrieve the 
platform's default encoding, for example a method `platformEncoding` in Charset or 
Process, or the `Console::charset` method you mentioned. Another option would be to 
enhance the Process API, by adding methods to Process which return appropriate 
Readers/Writers & adding methods of the form `redirectX(File file, Charset 
encoding)` to ProcessBuilder. But this seems like a lot of additional API surface, 
just to avoid surfacing the platform's default encoding itself.
So I think the JEP should specify how it'll address use cases w.r.t. the 
Process API, shouldn't it?

Kind regards,
Anthony

On Sunday, March 14, 2021 13:01 CET, Alan Bateman <[email protected]> wrote:

On 14/03/2021 11:00, Marco wrote:

:

IMO Charset should provide standardized getters for the OS charset and the
console charset. The latter being different has been a long standing issue on
Windows where the codepage differs between its CLI and regular environments.
OpenJDK has the necessary data already available in its custom system
properties.

The console charset is currently hidden behind PrintStream not exposing the
underlying OSWriter and not offering getEncoding() itself. The OS charset
would be hidden in the future by Charset.getDefaultCharset()'s specification
change in JEP 400.

The intention that there will be little or no impact to the console
streams. This means that java.io.Console reader/writer methods should
continue to return a Reader/PrintWriter that uses the platform encoding
(or code page is on Windows). Same thing for the System.out/System.err
print streams. We need to make this clearer in the JEP.

There has been discussion on this mailing list about adding a
Console::charset method but it didn't come to a consensus. Naoto Sato
and I have been chatting about it again recently as there may be a need
to add an API in advance of proposing to target the JEP.

One case that we are still mulling over is code that creates an
InputStreamReader on System.in without specifying the charset. This
might be older code that pre-dates java.io.Console or maybe code that
wasn't tested on a wide range or platforms. Options range from a spec
change to doing nothing (the latter meaning running with "COMPACT" or
migrating the code to use the 2-arg constructor as the default charset
is not the right choice).

-Alan

Re: Insufficiencies in JEP: 400: UTF-8 by Default

Reply via email to