All my upload forms have accept-charset="utf-8".    We expect that uploaded
filenames could have wide-characters.

The problem I hit was ->basename does this:

$ perl -le 'use Catalyst::Request::Upload; my $upload =
Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } );
print $upload->basename;'
_.pdf

That's pretty mangled.


The problem is that $upload->filename is not decoded so the substitution is
working on octets not characters.

sub _build_basename {
    my $self = shift;
    my $basename = $self->filename;
    $basename =~ s|\\|/|g;
    $basename = ( File::Spec::Unix->splitpath($basename) )[2];
    $basename =~ s|[^\w\.-]+|_|g;
    return $basename;
}


Obviously, we want \w to work on characters, not encoded octets.   Decoding
the filename should be done -- it's character data.

Does it make sense to do it in Engine's prepare_uploads?

For example:

            my $u = Catalyst::Request::Upload->new(
               size => $upload->{size},
               type => scalar $headers->content_type,
               headers => $headers,
               tempname => $upload->{tempname},
               filename =>
*$c->_handle_unicode_decoding($upload->{filename})*,
            );


-- 
Bill Moseley
mose...@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/

Reply via email to