Makoto, Michael, Gabor, Ivan:
In the discussion, I'm getting to think that the definition/policy of
the "valid filename (for storage)" may vary for circumstances or fields
where individual developer concerns, thus it should be hard to provide
the ultimate-flawless way for nomalizing filename. Even more, altering
get_valid_filename() may break a lot of existing code -- that would
be unacceptable for most of developers who has already written apps
with FileField.
Instead, I'd like to propose making get_valid_filename() as default
behaviour of ``filename_normalization``, and adding a
``filename_nomalizer`` parameter in FileField's constructor, like this::
{{{
Index: db/models/fields/__init__.py
===================================================================
--- db/models/fields/__init__.py (revision 4447)
+++ db/models/fields/__init__.py (working copy)
@@ -580,8 +580,10 @@
return forms.EmailField(**defaults)
class FileField(Field):
- def __init__(self, verbose_name=None, name=None, upload_to='', **kwargs):
+ from django.utils.text import get_valid_filename
+ def __init__(self, verbose_name=None, name=None, upload_to='',
filename_normalizer=get_valid_filename, **kwargs):
self.upload_to = upload_to
+ self.filename_normalizer = filename_normalizer
Field.__init__(self, verbose_name, name, **kwargs)
def get_manipulator_fields(self, opts, manipulator, change,
name_prefix='', rel=False, follow=True):
@@ -656,8 +658,7 @@
return
os.path.normpath(datetime.datetime.now().strftime(self.upload_to))
def get_filename(self, filename):
- from django.utils.text import get_valid_filename
- f = os.path.join(self.get_directory_name(),
get_valid_filename(os.path.basename(filename)))
+ f = os.path.join(self.get_directory_name(),
self.filename_normalizer(os.path.basename(filename)))
return os.path.normpath(f)
class FilePathField(Field):
}}}
By this change, you may specify your own filename nomalization code that
fits your paticular requirements. The change does not affect any existing
code, because we provide present get_valid_filename() as default.
I'll post the same patch (including documentation adds) to #3119.
tsuyuki makoto wrote:
> I'd like to ask for comments on #3119: problems on FileField/ImageField
> with multi-byte filenames. Since this problem is caused by two reasons,
> let me describe them step by step.
>
> Multibyte characters in a filename are lost in get_valid_filaname().
> --------------------------------------------------------------------
>
> As in django.db.models.fields, FileField and its subtype calls
> django.utils.text.get_valid_filename() to remove all "filename-unsafe"
> characters from given filename. The resulting filename consists
> of alphabets, numbers, hyphens and underscores. However, the behaviour
> raises undesirable effect for those country using multibyte filenames.
> For example, if original filename consists all of multibyte characters
> and '.txt' extension (such as 'ファイル.txt'), the resulting filename
> becomes '.txt' (no filename body but only extension).
>
> Underscore-suffix uniquification easily collapses
> -------------------------------------------------
>
> Things get worse if we have a lot of such files: since FileField
> suffixes underscores after filename until the filename become unique,
> if we have files of ['壱号文書.doc', '弐号文書.doc', '参号文書.doc', ...],
> then filename records will become ['.doc', '_.doc', '__.doc', ...].
> When the number of underscores reaches to maxlength of filename field
> (100 or so), then FileField will begin to raise errors because length
> of the filename exceeds limit.
>
> Proposed solution: punicode conversion before call
> django.util.text.get_valid_filename.
> ----------------------------------------------------------------------------
> Add STORE_FILENAME_AS_PUNYCODE to global_settings as False by default.
> Encodes the given string in punycode except the extension if
> STORE_FILENAME_AS_PUNYCODE is True.
> Then generate a clean file name in get_valid_filename and return it.
>
> >
>
>
--
Yasushi Masuda
http://ymasuda.jp/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---