On 10/12/2010 11:06 AM, Adrian Crum wrote:
On 10/12/2010 8:55 AM, Adam Heath wrote:
On 10/12/2010 10:25 AM, Adrian Crum wrote:
Actually, a discussion of database versus filesystem storage of content
would be worthwhile. So far there has been some hyperbole, but few
facts.
How do you edit database content? What is the procedure? Can a simple
editor be used? By simple, I mean low-level, like vi.
How do you find all items in your content store that contain a certain
text word? Can grep and find be used?
How do you handle moving changes between a production server, that is
being directly managed by the client, and multiple developer
workstations, which then all have to go first to a staging server? Each
system in this case has its own set of code changes, and config+data
changes, that then have to be selectively picked for staging, before
finally being merged with production.
What about revision control? Can you go back in time to see what the
code+data looked like? Are there separate revision systems, one for the
database, and another for the content? And what about the code?
For users/systems that aren't capable of using revision control, is
there a way for them to mount/browse the content store? Think nfs/samba
here.
Storing everything directly into the filesystem lets you reuse existing
tools, that have been perfected over countless generations of man-years.
I believe Jackrabbit has WebDAV and VFS front ends that will accommodate
file system tools. Watch the movie:
http://www.day.com/day/en/products/crx.html
Front end it wrong. It still being the store itself is in some other
system(database). The raw store needs to be managed by
the filesystem, so standard tools can move it between locations, or do backups,
etc. Putting yet another layer just to emulate
file access is the wrong way. <brainstorming>
Let's make a content management system. Yeah! Let's do it! So, we need to be
able to search for content, and mantain links
between relationships. Let's write brand new code to do that, and put it in
the database. Ok, next, we need to pull the
information out of the database, and serve it thru an http server. Oh, damn,
it's not running
fast. Let's have a cache that resides someplace faster than the database. Oh,
I know, memory! Shit, it's using too much
memory. Let's put the cache in the filesystem. Updates now remove the cache,
and have it get rebuilt. That means read-only
access is faster, but updates then have to rebuild tons of stuff. Hmm. We
have a designer request to be able to use
photoshop to edit images. The server in question is a preview server, is
hosted, and not on his immediate network. Let's create a new webdav access
method, to make the content look like a
filesystem. Our system is getting heavily loaded. Let's have a separate
database server, with multiple web frontends. Cool,
that works. The system is still heavily loaded, we need a super-huge database
server.
Crap, still falling over. Time to have multiple read-only databases.
</brainstorming>
or...
<brainstorming>
Let's store all our content into the filesystem. That way, things like
ExpanDrive(remote ssh fs access for windows) will work
for remote hosted machines. Caching isn't a problem anymore, as the raw store
is in files. Servers have been doing file
sharing for decades, it's a well known problem. Let's have someone else
maintain the file sharing code, we'll just use it to
support multiple frontends. And, ooh, our designers will be able to use the
tools they are familiar with to manipulate
things. And, we won't have the extra code running to maintain all the stuff in
the multiple databases. Cool, we can even use
git, with rebase and merge, to do all sorts of fancy branching and push/pulling
between multiple development scenarios.
</brainstorming> If the raw store was in the filesystem in the first place,
then all this additional layering wouldn't be
needed, to make the final output end up looking like a filesystem, which is
what was being replaced all along.