Thanks for your help Warren. I wrote my last message before seeing yours. I can see now that it can be confusing to track all the text encoding changes, but that it is only the last one that generally matters (assuming lossless conversion).
Before I discovered that the AddDefaultCharset Apache directive would solve my problem, I found a stopgap solution of setting $Response->{Charset} in my script. Thanks again! --- In [EMAIL PROTECTED], Warren Young <[EMAIL PROTECTED]> wrote: > karl wrote: > > I have > > text output coming from a database and ' (apostrophes) are shown in > > the browser (IE6) as ? (question marks). > > There's apostrophes and there are apostrophes. There's ASCII code 39, > there's Windows code page 1252 code 146, there's Unicode code > <mumble>.... The question is, which of these codes are in your > database? You must know the answer to that question before you can > decide how to proceed. > > Character code handling in the database/Apache::ASP/Perl5/Apache/browser > chain is stranger than you probably expect. Here's a post I wrote a few > months back detailing two chains I've personally observed: > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg01952.html > > Notice that I saw two rather different translation chains on my two test > systems! Your particular configuration is quite different from either > of mine, so it could give yet a third path. > > > The only thing I can figure out is that > > original output shows up as encoded Unicode (UTF-8) in the browser; > > Don't guess, find out. > > The way I did the analysis to make that post I linked to, I dumped the > text in question to a file at several places along the I/O chain, then I > examined each file. You should also use a network sniffer to see what > the HTTP headers and HTML data are without the browser getting in the > way. There's a good list of sniffers in the Winsock Programmer's FAQ, > if you don't have one already: > > http://tangentsoft.net/wskfaq/ > > I think you'll find, as I did, that your characters are being translated > back and forth between ISO 8859-x and Unicode multiple times, and that > the last step isn't being done correctly. > > That last step is critical because of the high probability that the > intermediate transformations are all lossless in your situation. All > you have to do is communicate to the browser what the final character > encoding is. In my particular situation, I had to change an Apache > setting to make it send a header informing the browser that the > character encoding was UTF-8. The browser was then able to display the > web page correctly, nevermind that the data was stored as ISO 8859- 1 > (Latin-1) in the database, and translated back and forth several times > along the path. > > > The only physical > > difference I can find between the output generated by Apache::ASP > > and IIS/ASP is that the Apache::ASP has Unix style LF line- endings > > and the IIS/ASP has DOS/Windows style CRLF line-endings. > > I'll bet you didn't compare the HTTP headers. Different web servers, > hence different headers, hence different browser interpretation. > > ------------------------------------------------------------------- -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]