The code cited is a little shortcut, if there is locale over there is
indeed using
utf-16, or any encoding that needs to switch/shift into ASCII (or its
single byte
charset area) with a shift/in/out character.. So far I'm not aware of
any such
a locale on any our supported platform. Historically, this kind of
assumption
might run into trouble when being ported to other platform, such as ebcdic
based system (but I don't think it's a problem in this case). Ideally,
the code
probably should be coded to be able to deal with a mb type of "/", but
obviously
it was decided to take the short-cut for better performance here.
"We" have been taking the stand that file.encoding is an
informative/read-only
system property for a long time, mainly because of two reasons. First this
property is really defined/implemented/used as the default encoding that
the jvm
uses to communicated with the underlying platform for local/encoding
sensitive
stuff, the default encoding of the file content, the encoding of the
file path and
the "text" encoding when use the platform APIs, for example. It's like a
"contract"
between the jvm and the underlying platform, it needs to be understood
by both
and agreed on by both. So it needs to be set based on what your
underlying system
is using, not something you want to set via either -D or
System.setProperty. If
your underlying locale is not UTF-16, I don't think you should expect
the jvm
could work correctly if it keeps "talking" in UTF-16 to the underlying
system,
for example, pass in a file name in utf-16, when your are running on a utf-8
locale (it is more complicated on a windows platform, when you have system
locale and user locale, and historically file.encoding was used for
both, consider
if your system locale and user locale are set differently...).
The property sun.jnu.encoding introduced in jdk6 (this is mainly
to address the issue we have with file.encoding on windows platform though)
somehow helps remove some "pressure" from the file.encoding, so in theory
file.encoding should be used to only for the encoding of "file content", and
the sun.jnu.encoding should be used when you need the encoding to talk to
those platform APIs, so something might be done here (currently
file.encoding
and sun.jnu.encoding are set to the same thing on non-Windows platform).
The other reason is the timing of how the file.encoding is being
initialized and
how it is being used during the "complicated" system initialization
stage, almost
everyone touched System. initializeSystemClass() got burned here and there
in the past:-) So sometime you want to ask if it is worth the risk to
change
something work for a use scenario that is not "supported". That said, as
I said above, something might be done to address this issue, but obviously
not a priority for now.
-Sherman
if you want to do -Dfile.encoding=xyz, you
are on your own, it might work, it might not work.
On 7/4/2012 11:00 PM, Dawid Weiss wrote:
Well, what's the "right" way to enforce an initial encoding for
charset-less string-to-byte conversions and legacy streams? I still
think that snippet of code is buggy, no matter if file.encoding is or
isn't a supported settable property.
Besides, from what I see in JDK code base everything seems to be code
in a way to allow external definition of file.encoding (comments
inside System.c for example). Where is it stated that file.encoding is
read-only?
Dawid
On Thu, Jul 5, 2012 at 3:09 AM, Xueming Shen<xueming.s...@oracle.com> wrote:
-Dfile.encoding=xyz is NOT a supported configuration. file.encoding is
supposed to be a read-only informative system property.
-Sherman
On 7/4/2012 1:21 PM, Dawid Weiss wrote:
There is a similar bug:
Bug 6795536 - No system start for file.encoding=x-SJIS_0213
Yeah... I looked at the sources in that package and there is at least
one more place which converts a String to bytes using getBytes(). This
seems to be a trivial fix in UnixFileSystem though. Anyway, bug ID for
this is:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7181721
Dawid
In this case on Windows.
-Ulf
Am 04.07.2012 14:43, schrieb Dawid Weiss:
Hi folks.
Run the following with -Dfile.encoding=UTF-16:
public class TestBlah {
public static void main(String []) throws Exception {
TimeZone.getDefault();
}
}
This on linux (and any unixish system I think) will result in:
java.lang.ExceptionInInitializerError
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:482)
at sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:477)
...
There is an encoding-sensitive part calling getBytes on the initial
path (and this screws it up):
// package-private
UnixFileSystem(UnixFileSystemProvider provider, String dir) {
this.provider = provider;
this.defaultDirectory =
UnixPath.normalizeAndCheck(dir).getBytes();
if (this.defaultDirectory[0] != '/') {
throw new RuntimeException("default directory must be
absolute");
}
Filed a bug for this but don't have the ID yet.
Dawid