Re: Sending large, generated files

Rick Wagner Wed, 15 Apr 2009 08:38:24 -0700

On Apr 14, 6:55 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> On Apr 15, 7:49 am, Alex Loddengaard <a...@cloudera.com> wrote:
>
>
>
> > I've found several messages on this list discussing ways to send large files
> > in a HttpResponse.  One can use FileWrapper, or one can use a generator and
> > yield chunks of the large file.  What about the case when the large file is
> > generated at HTTP request time?  In this case, it would be annoying to have
> > the user wait for the page to generate the large file and then stream the
> > file.  Instead we would want a way to start the HTTP response (so that the
> > user gets the download dialogue), generate the large file, and then stream
> > the file.  Let's take the following example:
>
> > def create_tarball():
>
> > >   path = create_some_big_tarball()
>
> > >   chunk = None
> > >   fh = open(path, 'r')
> > >   while True:
> > >     chunk = fh.read(1024 * 128)
> > >     if chunk == '':
> > >       break
> > >     yield chunk
>
> > > def sample_view(request):
> > >   response = HttpResponse(create_tarball(),
> > > mimetype='application/x-compressed')
> > >   response['Content-Disposition'] = "attachment;filename=mytarball.tar.gz"
>
> > The above example nearly accomplishes what we want, but it doesn't start the
> > HTTP response before the tarball is created, hence making the user wait a
> > long time before the download dialogue box shows up.  Let's try something
> > like this (notice the addition of a noop yield):
>
> > def create_tarball():
>
> >   yield '' # noop to send the HTTP headers
>
> > >   path = create_some_big_tarball()
>
> > >   chunk = None
> > >   fh = open(path, 'r')
> > >   while True:
> > >     chunk = fh.read(1024 * 128)
> > >     if chunk == '':
> > >       break
> > >     yield chunk
>
> > > def sample_view(request):
> > >   response = HttpResponse(create_tarball(),
> > > mimetype='application/x-compressed')
> > >   response['Content-Disposition'] = "attachment;filename=mytarball.tar.gz"
>
> > The issue with the above example is that the "yield ''" seems to be
> > ignored.  HTTP headers are not sent before the tarball is created.
> > Similarly, "yield ' '" and "yield None" don't work, because they corrupt the
> > tarball (HttpResponse calls str() on the iterable items given to the
> > HttpResponse constructor).  As a temporary solution, we're writing an empty
> > gzip file in the first yield.  Our large tarball is gzipped, and since gzip
> > files can be concatenated to one and other, our hack seems to be working.
> > In the above example, replace the first "yield ''" with:
>
> >   noop = StringIO.StringIO()
>
> > >   empty = gzip.GzipFile(mode='w', fileobj=noop)
> > >   empty.write("")
> > >   empty.close()
> > >   yield noop.getvalue()
>
> > I'm wondering if there is a better way to accomplish this?  I don't quite
> > understand why HTTP responses are written to stdout.  Possibly orthogonal to
> > that, it seems like, in theory, yielding an empty value in the generator
> > should work, because a flush is called after the HTTP headers are written.
> > Any ideas, either on how to solve this problem with the Django API, or on
> > why Django doesn't send HTTP headers on a "yield ''"?
>
> From memory, file wrappers at django level, in order to work across
> different hosting mechanisms supported, only allow a file name to be
> supplied. At the WSGI level the file wrapper actually takes a file
> like object. If you were doing this in raw WSGI, you could run your
> tar ball creation as a separately exec'd pipeline and rather than
> create a file in the file system, have tar output to the pipeline,
> ie., use '-' instead of filename. The file object resulting from the
> pipeline could then be used as input to the WSGI file wrapper object.
>
> So, if this operation isn't somehow bound into needing Django itself,
> and this is important to you, maybe you should create a separate
> little WSGI application just for this purpose.
>
> Actually, even if bound into needing Django you may still be able to
> do it. Using mod_wsgi, you could even delegate the special WSGI
> application to run in same process as Django and mount it at a URL
> which appears within Django application. Because though you are side
> stepping Django dispatch, you couldn't though have it be protected by
> Django based form authentication.
>
> Graham

Hi,

First, the FileWrapper class in django.core.servers.basehttp.py
accepts file-like objects, i.e., ones that have a read method. Which
is what leads me to suggest that your solution may be to write your
own FileWrapper class, that get the file on the first iteration.
Here's a modified, untested, version of FileWrapper:

class BigTarFileWrapper(object):
    """Wrapper to convert file-like objects to iterables"""

    def __init__(self, tar_args, blksize=8192):
        self.filelike = None
        self.tar_args = tar_args
        self.blksize = blksize

    def __getitem__(self,key):
        if not self.filelike:
            self.filelike = get_some_big_tarball(self.tar_args)
        data = self.filelike.read(self.blksize)
        if data:
            return data
        raise IndexError

    def __iter__(self):
        return self

    def next(self):
        if not self.filelike:
            self.filelike = get_some_big_tarball(self.tar_args)
        data = self.filelike.read(self.blksize)
        if data:
            return data
        raise StopIteration

Then your response becomes something this:

def sample_view(request, args):
    tar_iterator = BigTarFileWrapper(args)
    response = HttpResponse(tar_iterator,
                            mimetype='application/x-compressed')
    response['Content-Disposition'] =
"attachment;filename=mytarball.tar.gz"
    return response

This was inspired by the snippet that you may have seen [1], and my
experience in needing to return files from an external storage system
using my own iterator class.

--Rick

[1] http://www.djangosnippets.org/snippets/365/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: Sending large, generated files

Reply via email to