Please criticise this storage architecture...

Andy Robinson Wed, 03 Mar 2010 15:12:45 -0800

I'm planning a self-service publishing/reporting system.  Django will
likely be the web framework.  I am considering a rather unusual
storage architecture, and would love to know where the pitfalls are.


Users will be companies in a small vertical space.  They want to let
their clients (effectively the public) pull certain kinds of reports.
Users will occasionally log in to 'design stuff' - controlling the
appearance of the reports - or to upload new content.  During this
period they will be working with a fairly rich object graph.    The
report tools they have designed will then be available as web services
or web forms, 24x7, to let the public request content.  It ought to
scale up to multiple machines, but is not likely to be a web-scale
app.

Concrete example:  a tour operator logs in from time to time to update
hotel descriptions and destination guides, and to tweak the style of
their publications.  A member of the public can use the system to
input parameters and then get a brochure of all hotels meeting their
price/location/feature criteria.

It would be easy and natural to persist this as a few json objects (or
marshalled/pickled Python data).  It would be a huge PITA to decompose
it into an RDBMS, especially as the 'schema' is bound to evolve.
ACID is not important.

I would like an architecture which
 - is easy to work on.  A developer should be able to set up a working
copy fast, and it should be easy to set up on a new server.  The less
technologies apart from Python, the better.
 - is easy to support:  if a client is having problems, we want to
quickly be able to replicate their environment (content+code) on a
development machine
 - tracks history - maybe not everything, but makes it possible for
clients to save and undo at certain points
 - can provide redundancy later on if needed.

One solution seems absurdly simple and versatile:
 - each client gets a directory on the server(s)
 - their content (images/datasets) and configuration data live in
files under this.  Where possible we store data in line-oriented files
(Python objects saved as pretty-printed JSON or repr'd, tabular data
in delimited files).
 - When the user edits stuff in the web interface, we update a file on
the server.
 - we check everything which matters (app code and client content)
into Mercurial, Git or something similar.  Commit at midnight just in
case, and also when clients want to, or log out, or approve a set of
changes.
 - we can configure new servers through checking out
 - if we want a cluster to preserve transactional data, we could also
use rsync or unison running frequently
 - we can easily have multiple 'publishing' servers and one 'editing'
one, and control publishing/promotion
 - if we need to 'shard' and publish some clients to some servers,
that's easy.

I would still use a database for Django's auth application, but not
much else.  And if this architecture turns out to be wrong, we can
change the storage layer later.

I know that we need to take some precautions to make sure processes
don't clash trying to write files, but we're not talking about massive
concurrency.

So, why don't I hear about architectures like this?  Why would I want
to use more complex things (CouchDB, ZODB, blobs-in-RDBMS-tables)?
Has anyone built a nontrivial system this way, and what happened?

Thanks for all feedback.

Andy

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Please criticise this storage architecture...

Reply via email to