2008/6/10 PENPEN <[EMAIL PROTECTED]>:

>
> Hi there
> I met a problem when I use django to handle a file upload request
> while that file's name is in Chinese. The chinese part of the filename
> is truncated, e.g. I could only get ".txt" while the input filename is
> "汉.txt"


I suspect the truncation is due to this bug:

http://code.djangoproject.com/ticket/3119

(That is, assuming you are using a model FileField or ImageField.)

I use the following clean function
>    def clean_file(self):
>        if self.cleaned_data['file']:
>            fn =
> self.cleaned_data['file'].filename.decode(locale.getdefaultlocale()
> [1])
>            fn = '%s_%s' % ("test", fn)
>            self.cleaned_data['file'].filename = fn
>            return self.cleaned_data['file']
>        raise forms.ValidationError(_('Please upload a file.'))
>

The fact that you have to decode the supplied bytestring filename into a
unicode object is, I think, also a bug:

http://code.djangoproject.com/ticket/6009


> But it kept reporting error.


What error?  Previously you just mentioned disappearing non-ASCII chars.
The fact that you are using locale.getdefaultlocale()[1] doesn't seem quite
right to me.  The encoding of the filename coming in isn't necessarily going
to match the Django server's default locale encoding.  I don't know for sure
(and don't have time to comb through standards to figure it out) but I'd
expect it to be either utf8 or somehow specified in the metadata for the
incoming request.  As a first step I'd try just specifying 'utf8' there
instead of the locale... thing.


> I use "汉" as a test file name.


This is Unicode character U+6C49.


> Under python console, it is encoded as '\xba\xba' and could use
> decode('cp936') to get the unicode representation.


Under python console where?  On the source machine or the Django server?
The character U+6C49 is indeed encoded as '\xba\xba' in Windows code page
936, so that's OK, but it seems you've got at least one machine involved
here that doesn't use utf8 as its default encoding. I don't know from what
you've supplied if that's the machine where the file is coming from or where
you are trying to upload it to (or both).


> But from the django log, I saw that the filename is '\xe6\xb1\x89'.
> I'm not sure what it is.


What Django log?  '\xe6\xb1\x89' is the utf8 encoding of U+6C49.  Same
character, different encoding.  Is this the value you see if you print the
repr() of self.cleaned_data['file'].filename on entry to you clean()
function?  If so then you are getting utf8-encoded data there so you want to
decode using utf8.  That'll give you a unicode object containing U+6C49
which you can re-encode using a different encoding for file system storage
if necessary (that is if your Django server doesn't use utf8 as its file
system encoding).

How could I get the filename? Or is there any walk-around solution to
> solve this problem?
> Many thanks!
>

It sounds like you are getting the filename, utf8 encoded.   If that's not
the default encoding for your server (is your server file system using
Windows code page 936?) then you may need to take the input bytestring,
decode('utf8') to a unicode object, and then re-encode to a bytestring using
you server's default encoding (plus work around the fact that Django
currently strips non-ASCII chars from FileField file names...some ideas on
that are included in the tickets I linked to).

Karen

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to