Hi,

the file upload is handled by CGI.pm and not by Embperl itself. It looks like 
CGI.pm is doing some UTF8 conversion (or it is done when you write the file).

Perl's UTF-8 handling is a kind of mystery (and least to me). Every time I 
thought I had understood what is going on, I got a new surprise.

In the past the only way I got around is by try and error :-(

You might specify a binary encoding in your open statement (binmode only set 
the crlf <-> lf conversion, but it doesn't change charset conversion).

Gerald


> -----Original Message-----
> From: Jean-Christophe Boggio [mailto:embp...@thefreecat.org]
> Sent: Thursday, April 05, 2012 6:43 PM
> To: Chris Allen
> Cc: embperl@perl.apache.org
> Subject: Re: Upload problem
> 
> Thanks for taking the time to help me.
> 
> Le 05/04/2012 08:48, Chris Allen a écrit :
> > Can you include all of the headers here please?
> 
> I have attached the beginning of the dump (tcpdump addresses are changed
> to aa.aa.aaa.aa and bb.bbb.bb.bb but it's easy to find the real ones). Hope
> the list accepts attachments.
> The whole dump is 2.5Mb so I won't post it to the list but I have it handy if
> you need.
> 
> > It's possible you have more than one issue here. Firstly, what happens
> > if you upload several textfiles (ASCII data only)? Do they upload
> > correctly? Or perhaps they upload correctly but truncated?
> 
> Uploaded the full tcpdump (2670592bytes). It's pure 7-bit ASCII : same size,
> same md5sum Uploaded a linux-header Makefile (53Kb). Probably 7-bit ASCII
> : same size, same md5sum
> 
> Uploaded a big ASCII file containing a few accents :
>   1395336 original
>   1395118 copy
> Results are... insane : here is the diff :
> 
> diff -u 0410959v-phase2.txt 14.jpg
> --- original  2011-09-05 15:18:49.000000000 +0200
> +++ copy      2012-04-05 16:17:22.091080638 +0200
> @@ -38,18 +38,18 @@
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> +Use of uninitialized value in numeric eq (==) at ext-bin/do_5_gense2.pl line
> 1126.
> +Use of uninitialized value in numeric et-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use ofed value in n=) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
> +Use ozed value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use of uninitialized valu et-bin/do_5_genfichiers_phase2.pl line 1126.
> +Use of uninite in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use of uninitialized value in num a_5_genfichiers_phase2.pl line 1126.
> +Use of uninitialized eric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line
> 1126.
> +Use of uninitialized value in numeric eq (inhiers_phase2.pl line 1126.
> +Use of uninitialized value in ==) at ext-bin/do_5_genfichiers_phase2.pl line
> 1126.
> +Use of uninitialized value in numeric eq (==) at exense2.pl line 1126.
> +Use of uninitialized value in numeric et-bichiers_phase2.pl line 1126.
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
>   Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> @@ -258,7 +258,7 @@
>   Warning: Permanently added '[10.141.0.61]:2222' (RSA) to the list of known
> hosts.
>   Ubuntu 10.04.3 LTS
>   Warning: Permanently added '192.168.122.130' (RSA) to the list of known
> hosts.
> -Arret du LDAP (patienter 10 secondes)
> +Arret  LDAP (patienter 10 secondes)
>   Stopping daemon monitor: monit.
>   Stopping OpenLDAP: slapd.
>   tar: Removing leading `/' from member names
> 
> The differences are lines 41-52 and 261 though the file is 23818 lines long. I
> guess it comes from the fact that there's only one 32768-bytes buffer
> "corrupted" ?
> Accents are only lines 2-191 (not on all lines) The accents are still there,
> untouched. In the original file, they are UTF-8 encoded :
> iconv -f utf8 -t latin1 original >/dev/null
>    -> no error
> 
> Also the files are not "truncated", there are bits randomly missing in the
> middle.
> 
> 
> So as I understand it, the problemS (UTF8 encoding + bits missing) arise only
> when
> non-UTF8 characters are encountered.
> 
> If you have ideas of where/what I can look next...
> 
> Thanks for your patience,
> 
> --
> Jean-Christophe Boggio                       -o)
> embp...@thefreecat.org                       /\\
> Independant Consultant and Developer        _\_V
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscr...@perl.apache.org
> For additional commands, e-mail: embperl-h...@perl.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscr...@perl.apache.org
For additional commands, e-mail: embperl-h...@perl.apache.org

Reply via email to