I'm planning a self-service publishing/reporting system. Django will likely be the web framework. I am considering a rather unusual storage architecture, and would love to know where the pitfalls are.
Users will be companies in a small vertical space. They want to let their clients (effectively the public) pull certain kinds of reports. Users will occasionally log in to 'design stuff' - controlling the appearance of the reports - or to upload new content. During this period they will be working with a fairly rich object graph. The report tools they have designed will then be available as web services or web forms, 24x7, to let the public request content. It ought to scale up to multiple machines, but is not likely to be a web-scale app. Concrete example: a tour operator logs in from time to time to update hotel descriptions and destination guides, and to tweak the style of their publications. A member of the public can use the system to input parameters and then get a brochure of all hotels meeting their price/location/feature criteria. It would be easy and natural to persist this as a few json objects (or marshalled/pickled Python data). It would be a huge PITA to decompose it into an RDBMS, especially as the 'schema' is bound to evolve. ACID is not important. I would like an architecture which - is easy to work on. A developer should be able to set up a working copy fast, and it should be easy to set up on a new server. The less technologies apart from Python, the better. - is easy to support: if a client is having problems, we want to quickly be able to replicate their environment (content+code) on a development machine - tracks history - maybe not everything, but makes it possible for clients to save and undo at certain points - can provide redundancy later on if needed. One solution seems absurdly simple and versatile: - each client gets a directory on the server(s) - their content (images/datasets) and configuration data live in files under this. Where possible we store data in line-oriented files (Python objects saved as pretty-printed JSON or repr'd, tabular data in delimited files). - When the user edits stuff in the web interface, we update a file on the server. - we check everything which matters (app code and client content) into Mercurial, Git or something similar. Commit at midnight just in case, and also when clients want to, or log out, or approve a set of changes. - we can configure new servers through checking out - if we want a cluster to preserve transactional data, we could also use rsync or unison running frequently - we can easily have multiple 'publishing' servers and one 'editing' one, and control publishing/promotion - if we need to 'shard' and publish some clients to some servers, that's easy. I would still use a database for Django's auth application, but not much else. And if this architecture turns out to be wrong, we can change the storage layer later. I know that we need to take some precautions to make sure processes don't clash trying to write files, but we're not talking about massive concurrency. So, why don't I hear about architectures like this? Why would I want to use more complex things (CouchDB, ZODB, blobs-in-RDBMS-tables)? Has anyone built a nontrivial system this way, and what happened? Thanks for all feedback. Andy -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.