I agree with Uwe, we should deprecate all methods/constructors that relies on the default charset.
And we should do that before changing to use UTF-8 by default. Remi On February 21, 2018 8:53:54 AM UTC, Uwe Schindler <uschind...@apache.org> wrote: >Hi, > >> This draft JEP contains a proposal to use UTF-8 as the default >charset >> for the JVM, so that >> APIs that depend on the default charset behave consistently cross all >> platforms. >> >> For more details, please see: >> https://bugs.openjdk.java.net/browse/JDK-8187041 > >Thanks for finally adding a JEP like this. Thanks also to Robert Muir >for always insisting in fixing this problem! I have a few comments: > >The JEP should NOT cause that new APIs, which may convert between >characters and bytes to no longer explicitly accept a charset. One >example is the proposed ByteBuffer methods taking String. The default >ones would work with UTF-8, but it should still be possible to an API >user to always add a charset whenever there is a conversion between >bytes and chars. This is especially important as the user may still >change the default and breaking your app. Because the rule is still: >Only YOU, the developer, know the charset of your stuff when you load a >JAR resource file or pass a String to the network in a ByteBuffer! > >The biggest offenders on this is also given as an example: FileReader >and FileWriter. Although both classes subclass >InputStreamReader/OutputStreamWriter and just pass the right delegate >to the superclass in the ctor, both classes are missing the possibility >to specify a charset. Because of this, the use of FileReader and >FileWriter is completely forbidden in many Apache projects (Apache >Lucene, Solr, Elasticsearch, Apache TIKA,...). So I'd suggest to also >fix the API here and just add the missing ctors. > >The Java 7+ methods in java.nio.file.Files already ignore the default >charset and always use UTF-8. How to proceed with those? Should they be >changed to behave to the new mechanisms? I'd suggest to not do this, as >its part of the spec (to use UTF-8) and should not rely on external >forces, but I wanted to bring this in. > >Changing the default would help many users, if they are actually using >newer JDKs. For those with older versions (and compiling their code >against older versions), you still have to avoid the default charsets. >In addition, as you still can change the "default charset", any library >developer reading resources from its own JAR file or passing Strings to >network protocols cannot rely on the fact, that the default charset is >really UTF-8! (a user may have changed it to something else). Because >of this, Apache libraries will forbid usage of all methods using >default charsets (and locales + timezones). The "changeable default" >does not affect application developers (because they have in most cases >control about the environment), but library developers should always be >explicit! > >For this to work, I also want to do some "advertisement": All library >projects should use the Forbidden-Apis Maven/Gradle/Ant plugin to scan >their bytecode for offenders using default charsets, default locales or >relying on default timezones. See the blog post about it [1] and the >project page [2]. The tool is also useful to replace "jdeps" in >projects with Java versions before 8, as it can scan your code for >access to internal JDK APIs, too. See the documentation [3] and github >wiki pages for useful examples. It may also be a good idea to mention >it in the JEP as a "workaround" or "further reading". > >Finally: Because one can still change the default, I'd propose to >deprecate all methods that use a default charset (unrelated to actually >changing the default). Only if you do this, it would make tools like >"forbiddenapis" irrelevant for library developers. > >And finally, finally: I'd also propose to change the default Locale to >Locale.ROOT (same issues). The String.toLowerCase() in Turkish locales >still break thousands of apps! But that's a different JEP - but I would >strongly support it! > >Uwe > >[1] >http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html >[2] https://github.com/policeman-tools/forbidden-apis >[3] https://jenkins.thetaphi.de/job/Forbidden-APIs/javadoc/ > >----- >Uwe Schindler >uschind...@apache.org >ASF Member, Apache Lucene PMC / Committer >Bremen, Germany >http://lucene.apache.org/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity.