Stas Bekman wrote:

I have attempted to shoe-horn this into mod_perl's print() method (in "src/modules/perl/Apache.xs"). Here's the diff against mod_perl 1.28: [Unfortunately, I've had to comment-out the first part of that "if" block, because I got an unresolved external symbol error relating to the PerlIO_isutf8() function otherwise (which may be because that function isn't documented in the perlapio manpage).]

--- Apache.xs.orig    2003-06-06 12:31:10.000000000 +0100
+++ Apache.xs    2003-07-15 12:20:42.000000000 +0100
@@ -1119,12 +1119,25 @@
    SV *sv = sv_newmortal();
    SV *rp = ST(0);
    SV *sendh = perl_get_sv("Apache::__SendHeader", TRUE);
+    /*PerlIO *fp = PerlIO_stdout();*/

    if(items > 2)
        do_join(sv, &sv_no, MARK+1, SP); /* $sv = join '', @_[1..$#_] */
    else
        sv_setsv(sv, ST(1));

+ /*if (PerlIO_isutf8(fp)) {
+ if (!SvUTF8(sv))
+ sv_utf8_upgrade(sv = sv_mortalcopy(sv));
+ }
+ else*/ if (DO_UTF8(sv)) {
+ if (!sv_utf8_downgrade((sv = sv_mortalcopy(sv)), TRUE)
+ && ckWARN_d(WARN_UTF8))
+ {
+ Perl_warner(aTHX_ packWARN(WARN_UTF8), "Wide character in print");
+ }
+ }
+
PUSHMARK(sp);
XPUSHs(rp);
XPUSHs(sv);


Besides the problem with PerlIO_isutf8(), there are other problems that spring to my mind straight away with this:
- is getting the PerlIO * for STDOUT to right thing to be doing anyway?
- if "items > 2", do we need to handle the UTF8-ness of each of those items individually before we join them?
- we need to code this in such a way as to remain backwards compatible with older Perls.


looks like this is the main question. Do we handle utf8 only for perl 5.8?

It's only Perl 5.8 that has the special "UTF-8 flag" which the functions above all operate with respect to. If a Perl variable contains a sequence of bytes that make up a valid UTF-8 character, but the string is not flagged with Perl's special flag, then Perl's built-in print() doesn't do this automatic conversion anyway.


IOW,

   print "Content-type: text/plain\n\n";
   $a = "\xC3\xBC";
   print $a;

retrieved from a mod_cgi server produces (via od -b / od -c):

   0000000 303 274
   0000002

Perl 5.6 and older don't have the UTF-8 flag and hence don't do any automatic conversion via print(). Therefore, mod_perl's print() should not have the difference from Perl's print() that exists in 5.8, so no change should be required.

Sure enough, looking at the "doio.c" source file in Perl 5.6.1, the entire chunk of code that I half-inched above is not present.

Steve



Reply via email to