Makoto, Michael, Gabor, Ivan:

In the discussion, I'm getting to think that the definition/policy of
the "valid filename (for storage)" may vary for circumstances or fields
where individual developer concerns, thus it should be hard to provide
the ultimate-flawless way for nomalizing filename. Even more, altering
get_valid_filename() may break a lot of existing code -- that would
be unacceptable for most of developers who has already written apps
with FileField.

Instead, I'd like to propose making get_valid_filename() as default
behaviour of ``filename_normalization``, and adding a
``filename_nomalizer`` parameter in FileField's constructor, like this::

{{{

Index: db/models/fields/__init__.py

===================================================================
--- db/models/fields/__init__.py        (revision 4447)
+++ db/models/fields/__init__.py        (working copy)
@@ -580,8 +580,10 @@
         return forms.EmailField(**defaults)
 
 class FileField(Field):
-    def __init__(self, verbose_name=None, name=None, upload_to='', **kwargs):
+    from django.utils.text import get_valid_filename
+    def __init__(self, verbose_name=None, name=None, upload_to='', 
filename_normalizer=get_valid_filename, **kwargs):
         self.upload_to = upload_to
+        self.filename_normalizer = filename_normalizer
         Field.__init__(self, verbose_name, name, **kwargs)
 
     def get_manipulator_fields(self, opts, manipulator, change, 
name_prefix='', rel=False, follow=True):
@@ -656,8 +658,7 @@
         return 
os.path.normpath(datetime.datetime.now().strftime(self.upload_to))
 
     def get_filename(self, filename):
-        from django.utils.text import get_valid_filename
-        f = os.path.join(self.get_directory_name(), 
get_valid_filename(os.path.basename(filename)))
+        f = os.path.join(self.get_directory_name(), 
self.filename_normalizer(os.path.basename(filename)))
         return os.path.normpath(f)
 
 class FilePathField(Field):

}}}

By this change, you may specify your own filename nomalization code that
fits your paticular requirements. The change does not affect any existing
code, because we provide present get_valid_filename() as default.

I'll post the same patch (including documentation adds) to #3119.


tsuyuki makoto wrote:
> I'd like to ask for comments on #3119: problems on FileField/ImageField
> with multi-byte filenames. Since this problem is caused by two reasons,
> let me describe them step by step.
>
> Multibyte characters in a filename are lost in get_valid_filaname().
> --------------------------------------------------------------------
>
> As in django.db.models.fields, FileField and its subtype calls
> django.utils.text.get_valid_filename() to remove all "filename-unsafe"
> characters from given filename. The resulting filename consists
> of alphabets, numbers, hyphens and underscores. However, the behaviour
> raises undesirable effect for those country using multibyte filenames.
> For example, if original filename consists all of multibyte characters
> and '.txt' extension (such as 'ファイル.txt'), the resulting filename
> becomes '.txt' (no filename body but only extension).
>
> Underscore-suffix uniquification easily collapses
> -------------------------------------------------
>
> Things get worse if we have a lot of such files: since FileField
> suffixes underscores after filename until the filename become unique,
> if we have files of ['壱号文書.doc', '弐号文書.doc', '参号文書.doc', ...],
> then filename records will become ['.doc', '_.doc', '__.doc', ...].
> When the number of underscores reaches to maxlength of filename field
> (100 or so), then FileField will begin to raise errors because length
> of the filename exceeds limit.
>
> Proposed solution: punicode conversion before call
> django.util.text.get_valid_filename.
> ----------------------------------------------------------------------------
> Add STORE_FILENAME_AS_PUNYCODE to global_settings as False by default.
> Encodes the given string in punycode except the extension if
> STORE_FILENAME_AS_PUNYCODE is True.
> Then generate a clean file name in get_valid_filename and return it.
>
> >
>
>   


-- 
Yasushi Masuda
http://ymasuda.jp/



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to