It seems that requests for some way to store binary data in the
database is a perennial request. I've seen comments (I think Adrian
said it "opens a can of mutated worms."), but never a real
discussion of what the problems would be.
There's a recent ticket, #2417, that adds support for "small-ish"
bits of binary data using a BinaryField. It uses the VARBINARY type
in MySQL, which is limited to 64k bytes of data. It also subclasses
CharField, so it will show up in the Admin, which may or may not be a
good idea.
The main problems I see with dealing with binary types in the
database involve making sure that you never have too much stuff in
memory at any time, either while you're loading the file into memory
or while you're outputting it to the HttpResponse. I solved this in a
Java webapp by breaking the file into chunks, storing each chunk
separately in the database, and uploading and downloading only a
chunk at a time.
Here's what I'm suggesting:
class DatabaseFile(models.Model):
name = models.CharField(maxlength=80)
content_type = models.TextField()
last_modified = models.DateTimeField()
size = models.IntegerField()
owner = models.ForeignKey(User, blank=True)
class DatabaseFileChunk(models.Model):
file = models.ForeignKey(DatabaseFile)
number = models.IntegerField()
content = models.BinaryField() #implemented with bytea in
Postgres and BLOB in MySQL
When a file is uploaded and the developer wants it to go into the
database, only 64kb of data (or so, but this seems reasonable) are
read into memory at a time and stuck into a DatabaseFileChunk,
numbered consecutively from 0 to however many are needed.
A DatabaseFile would be a file-like object, iterable, and would be
output to an HttpResponse a chunk at a time, so never more than about
64k of server memory is used to serve the file from the database.
Yes, this will be slower than having Apache serve the file directly,
but it has the huge advantage that the file is served as the result
of a view. That means you can do all kinds of interesting permission
checking, url mapping, and general futzing around internal to Django,
without having to interact with whichever web server you're using.
Using fairly big BLOBs locally (images of about 750kb), the file was
served almost instantaneously using Django's development server, so I
think the performance hit would likely be acceptable to people who
really want the ability to save files in the database rather than on
the filesystem. (And I, for one, desperately need that flexibility.
I'm going to have about 1900 users, and I don't want to have to route
files to folders, set Apache permissions, etc., when Django has such
a nice API for handling relations.)
I'm going to try to code this up this afternoon, but please let me
know if anyone sees huge problems with it.
Thanks,
Todd
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---