OK, just to help out others with the pain I have suffered over this issue.
Both Apache and Java take pains to support internationalization. This is
the root of the problem.
When writing the output file from java, it was picking up the character
set from the Apache environment. In both of the Apache servers I found,
it was an ANSI character set even though everything else in the
operating system was running UTF-8.
This was the reason that when I ran my program from the command prompt
it worked fine because the command prompt was running under UTF-8. When
running under Apache, though, it was translating my special unicode
characters incorrectly because I needed to use the UTF-8 character set
to get my character.
After a week of research, I finally figured out to use the
OutputStreamWriter class and give it "UTF-8" as the character set to use
for that file and everything is now working fine.
Hope this helps someone else. Sorry for slightly off-topic, but was good
information anyway.
--- Begin Message ---
OK, not exactly perl, but this was the closest list I could find.
I am running a perl CGI script that launches a java program. This java
program writes output files that are delimited using what I believe to
be a unicode character. On most editors it looks like an upside-down
question mark, which I believe is correct. On some editors, it shows as
a degree symbol. This character is represented by the hex pair 0xc2a1.
Here is the character 'ยก'.
Now here is the problem. When I test my java program everything is
great. When I test the perl script to launch the java program, all is
still well. When I run my perl script through CGI, though, it replaces
each occurrence of the above character with ??. I cannot understand why
the CGI is interfering with file output from my program. This is not
going through display, but is directly writing this file. Anyone have
any ideas?
Also, if anyone can suggest a better list, I'd appreciate that too.
Thanks.
--- End Message ---