Stas Bekman wrote:
I'd suggest to take whatever data you s/// and try it outside mod_perl first. May be your filter or some previous filter has truncated the UTF-8 char in the middle? You should be aware that other filters are not aware of the encoding, and they just give you the amount of data your filter asks for. So it's quite possible that you can't process the data as-is when you get it, because you may get only a half of the char. So you either need to recognize that and buffer it up for the next filter invocation or you should ask for more data to get the other half.
It'd be definitely a good test to add to our test suite, once this is resolved on your side.
Thanks Stas,
Here is my handler(). How can you tell if you're in the middle of a UTF-8 character or not? Also, does perl know at this point that it is a UTF-8 string? or do I need to tell it again (ie as the string goes through apache it looses it UTF-8 bit?)
sub handler { my $f = shift; my $r = $f->r;
unless ($f->ctx) { $f->r->headers_out->unset('Content-Length'); set_globals($f->r); $f->ctx(1); $leftover = ''; }
while ($f->read(my $buffer, BUFF_LEN)) { $f->print (do_it ($r, $leftover . $buffer)); } return Apache::OK; }
sub do_it { my $r = shift; local ($_); $_ = shift;
# [insert the regex you've seen]
# bucket brigades may split an NMML expression over # multiple buckets hold on to any trailing NMML and # prepended it to the start next time if (/<\?nm/) { ($_, $leftover) = split (/<\?nm/, $_, 2); $leftover = "<?nm$leftover"; } else { $leftover = ''; }
return $_; }
__________________________________________________________________ Stas Bekman JAm_pH ------> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
-- Matthew Darwin [EMAIL PROTECTED] http://www.mdarwin.ca
-- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html