Have we even tested serving large files through the app stack?  I strongly
suspect they'd hit the long-request timeout.  I know I've hit it before
when testing uploading large-ish attachments.

And on the subject of attachments, the API end-points already (or will with
the next push) include attachment metadata, including the URL to download
them from.  I definitely think that's good enough for now, as the admin can
parse the URLs out and download them, if needed.  If that proves to be too
onerous for doing project exports, then we can address it at that time.

Going back to serving up the exports, is there any way we could serve them
outside of the app stack but still with authentication?  Such as a
standalone, light-weight service that just serves files with authentication
(could be useful for the screenshots and icons for private projects), or
via authenticated SFTP?  This is verging on an infrastructure question at
this point, but I definitely agree that we should have some auth in front
of it but it's not going to be easy.


On Tue, Jun 18, 2013 at 10:26 AM, Dave Brondsema <[email protected]> wrote:

> For us at SourceForge, we have a need to build a feature that lets project
> admins download a backup/export of all their project data.  Since this is a
> pretty big feature, I wanted to propose here how we might do it and get
> feedback
> & ideas before we proceed.
>
> Add a bulk_export() method to Application which would be responsible for
> generating json for all the artifacts in the tool.  The format should
> match the
> API format for artifacts so that we're consistent.  Thus any tool that
> implements bulk_export() would typically loop through all the artifacts
> for this
> instance (matching app_config_id) and convert to json the same way the API
> json
> is generated (e.g. call the __json__ method or RestController method; some
> refactoring might be needed).  Multiple types of artifacts/objects could be
> listed out in groups, e.g. Tracker app could have a list of tickets, list
> of
> saved search bins, list of milestones, and the tracker config data.
>  Discussion
> threads would need to be included too, ideally inline with the artifact
> they go
> with.  No permission checks would be done since this export would only be
> available to admins (makes it faster & simpler).
>
> Provide a page on the Admin sidebar to generate a bulk export.  Project
> admins
> could choose individual tool instances, or all tools in the project (that
> support it).  That form would kick off a background task which goes
> through the
> selected tools and runs their bulk_export() methods.  Save each tool's
> data as
> mount_point.json and zip them all together.
>
> It'd be easiest to store & deliver the zip files similarly to the code
> snapshots
> (static files not served through allura), but that won't be secure.  We'll
> need
> to either serve it through allura with authentication, or maybe name the
> zip
> file with a random name that can't be guessed (and then serve it directly
> through apache or nginx).  Other ideas?
>
> When the task is complete, notify the user.  What way is best?  Send an
> email?
> Probably would be good to show a listing of available completed extracts
> on the
> extract page, so if any older ones are still sitting around they can be
> retrieved (would be up to server admins to have a cron to delete old files)
>
> We could make this something that can be triggered automatically via the
> API and
> check status through the API, but that seems like a good thing to add on
> later.
>
> Should we include attachments?  These would be important in some cases but
> not
> in others.  It could also increase the export size immensely in some cases.
> Maybe leave out for now, and add in later when needed, possibly as an
> option.
>
> Further thoughts on implementation details:
>
> So that a giant json string doesn't have to be held in memory for each
> tool, the
> export task should open a file handle for mount_point.json and send call
> bulk_export() with that open file handle and each App can append to their
> file
> incrementally.
>
> If mongo performance is slow, some refactoring may be needed to avoid lots
> of
> individual mongo calls and be more batch oriented.  We can see how it goes.
>
> Could parallelize bulk_export() later, to do multiple tools at once.
>
>
> Sound reasonable?  Any suggestions or other ideas?
>
>
> --
> Dave Brondsema : [email protected]
> http://www.brondsema.net : personal
> http://www.splike.com : programming
>               <><
>

Reply via email to