On 7/30/13 4:06 PM, David DeHaven wrote:

Judging from the docs, nl_langinfo seems like a Unix portability
function (something more likely to be happier with ASCII in a
terminal), not something to be used by a native Cocoa application.

Exactly - so I think it expects to be called from a cmdline with a shell-style surrounding environment, with LANG/etc variables set.

David suggests that calling nl_langinfo() is "asking the wrong question." In the particular context of double-click launching on Mac, you could say that's true (or at least asking the question in the wrong way).

But consider - the code in question is shared with other Unix platforms, and when running from the cmdline/shell scripts/etc, nl_langinfo() *is* the right way to ask the question.

To ask the right question for this specific context on MacOS X (NSLocale or CFLocale) I suspect would involve a fair amount of code surgery, and the end result would be the same. Given this, I think my proposed change is a good one from a practical standpoint.

Thank you, everyone, for your feedback.

-Brent

Apple is highly unlikely to change the behavior of nl_langinfo().

There is already code in the JDK that calls into JRSCopyPrimaryLanguage(), 
JRSCopyCanonicalLanguageForPrimaryLanguage(), and JRSSetDefaultLocalization() 
for exactly this purpose.

Please proceed with setting the encoding to UTF-8. It is the de-facto standard 
for every Cocoa application I have ever seen. US-ASCII is always the wrong 
choice for a graphical app on OS X.

Regards,
Mike Swingler
Apple Inc.

On Jul 30, 2013, at 9:05 AM, Francis Devereux <fran...@devrx.org> wrote:

I suspect that Apple might be unlikely to change the value that nl_langinfo 
returns when LANG is unset.

However, it might be possible to fix this issue without second-guessing the 
character set reported by the OS by calling [NSLocale currentLocale] (or the 
CFLocale equivalent) instead of nl_langinfo. I think (although I haven't 
checked) that that [NSLocale currentLocale] determines the current locale using 
a mechanism other than environment variables, because LANG is usually be unset 
for GUI apps on OS X.

On 30 Jul 2013, at 15:56, Scott Palmer <swpal...@gmail.com> wrote:

Then shouldn't you be complaining to Apple that the value returned by
nl_langinfo needs to be changed?
David's point seems to be that second guessing the character set reported
by the OS is likely to cause a different set of problems.

Scott


On Tue, Jul 30, 2013 at 10:14 AM, Johannes Schindelin <
johannes.schinde...@gmx.de> wrote:

Hi,

On Tue, 30 Jul 2013, David Holmes wrote:

On 30/07/2013 5:54 AM, Brent Christian wrote:
On 7/28/13 10:13 PM, David Holmes wrote:
On 27/07/2013 3:53 AM, Brent Christian wrote:
Please review my fix for 8011194 : "Apps launched via
double-clicked
.jars have file.encoding value of US-ASCII on Mac OS X"

http://bugs.sun.com/view_bug.do?bug_id=8011194

In most cases of launching a Java app on Mac (from the cmdline, or
from a native .app bundle), reading and displaying UTF-8
characters beyond the standard ASCII range works fine.

A notable exception is the launching of an app by double-clicking
a .jar file.  In this case, file.encoding defaults to US-ASCII,
and characters outside of the ASCII range show up as garbage.

Why does this occur? What sets the encoding to US-ASCII?

"US-ASCII" is the answer we get from nl_langinfo(CODESET) because no
values for LANG/LC* are set in the environment when double-clicking a
.jar.

We get "UTF-8" when launching from the command line because the
default Terminal.app setup on Mac will setup LANG for you (to
"en_US.UTF-8" in the US).

Sounds like a user environment error to me. This isn't my area but I'm
not convinced we should be second guessing what we think the encoding
should be.

Except that that is not the case here, of course. The user did *not* set
any environment variable in this case.

So we are not talking about "second guessing" or "user environment error"
but about a sensible default.

As to US-ASCII, sorry to say: the seventies called and want their
character set back.

There can be no question that UTF-8 is the best default character
encoding, or are you even going to question *that*?

What if someone intends for it to be US-ASCII?

Then LANG would not be unset, would it.

Hth,
Johannes





Reply via email to