On Apr 13, 2005, at 4:25 AM, Gav.... wrote:


From: "Bart Lateur"

| BTW I tend to agree with you, without actually being convinced of its
| technological superiority. BTW one can use mod_rewrite to nicen up the
| URL for images.
|
| But files inside databases tend to blow up the actual database files.
| There's a lot of air in databases. Wasted disk space.


I seem to be getting confused here about databases and file structures.
Yes, a database holds data, when it comes to images, does the database
not hold a pointer to the image? At the end of the day, the database itself
is held on the hard drive the same as any other file in sectors as '1's and
'0's
so ultimately come from the same source. A database structure may seem to
be held in one place, but is still probably fragmented all over the place.


I am open to correction, my flame retardant suit is on and ready :)


Well as usual it depends. :-)

There are many types of filesystems that can behave very differently depending on how they're used.
For example if you don't care about journaling, ext2 is simple and fast. If you want journaling and loads of small files, you're better off with ReiserFS, etc...
The different filesystems use different algorithms for determining where the files reside, they use different block sizes, they have different defragmentation techniques, etc... This all impacts how well each filesystem performs for your application.
Add on top of that the different buffering and caching schemes of the OSes, and you can't really tell what's better unless you try out a number of combinations.


If you're going to use a database as a filesystem where you actually store the binaries (as opposed to a filesystem 'pointer'), you have to look at the same issues: access algorithms, caching, block sizes, etc...
Some databases can only be installed on top of 'cooked' filesystems, i.e. the database data resides on top of the filesystem and can only use the filesystem block size. Other databases can have 'raw' partitions for their data, which allows them to fully control how the partition is managed, up to the block size and caching mechanism. So if you have larger files, you can use 16KB page sizes for data pages, significantly increasing the throughput as opposed to the default 4KB.


In my experience, you just have to try a number of scenarios to see which one is best for your application. I personally prefer keeping the binary data out of databases, where I have better control over the low-level access to that data, and I can better manage the backup and maintenance.

H



Reply via email to