>> This discussion begs the question: why store an mp3 file (or any file  
>> for that matter) as a blob in a database? Why not just store the name  
>> of the
>> file? What is the advantage to storing the data as a blob. You cannot  
>> use any of the database search or sort capabilities on blobs. One  
>> argument
>> I have heard is that storing the contents of a file in an RDBMS allows  
>> you to control access through the database rights. Server operating  
>> systems
>> provide file server rights that can do the same thing.

>But an operating system doesn't allow you to control to the same degree.  
>For example, you can grant read access to a file in an OS, but say you  
>want to grant access at a particular time, or for a snippet of the file  
>or, at the application level, "access" to the file that allows it only to  
>be read in a particular way.

Server OS's typically have a direct way to limit user access based on
time and day of the week automatically. If you need more than that, scheduled
jobs could automate this. As for snippets, the snippets could be in separate
files with separate access rights. The only way you can restrict access to
a snippet in an RDBMS is if you assume they only have access through
your application.

>Or, say that you want users to be able to write to a file, but only in the  
>manner specified by your application.

This assumes that the only database access the user has is through
your application. What if they use a general purpose database access
tool and go directly to the data?

>Let's say that you're trying to build a search algorithm, to make it  
>possible to meaningfully search for patterns. If you do any sort of  
>hashing, your first step will have to be to verify that the file hasn't  
>changed, or to monitor the OS constantly for changes.

Would you not need to do the same thing in an RDBMS? In a file
system, you have a timestamp on the file automatically maintained
by the OS. In an RDBMS you could do the same thing, but you would
have to manage it yourself with triggers and a separate timestamp
column.

>Generally, you've got a finer-grained control.

All of these arguments assume that the only access to the data is
through your application. The reality is that anyone with rights to
the table can use general purpose database access tools to go
directly to the data without using your application. We write these
kinds of tools and sell them regularly to companies who need
more from their data than a canned system offers them.

The common way people accomplish a closed database approach
is by using a middleware where the middle applications are the
only ones with rights to the RDBMS. This approach is only
acceptable to unsophisticated users who are unable to do anything
with their data beyond what your application does for them. What
if your installed base wants to use other tools (examples: Excel,
Crystal Reports), to access the same data? This middleware
approach assumes your users are willing to be captive to your
application. This is the opposite of an open system. As users
become more and more knowledgeable about managing their
own data, you will find the demand for such closed systems will
drop off precipitously. In fact, we find that users of closed systems
are usually unaware of this limitation and would not have purchased
the system in the first place had they known about it.

All of the requirements that you suggest could be accomplished
in other ways through the file system, and if you use the middleware
approach, you would gain all the control which you attribute to the
blob approach, i.e., the middle application would be the only one
with file system rights to the files.

My only point is that an RDBMS is not a file system
and using blob columns to store entire files is very inefficient and
reduces the performance of the RDBMS. You'll find that the main
proponents of using blob data in this way are DBA's and RDBMS
companies. I don't blame them for advocating this approach as it will
inevitably lead to the need to expand the RDBMS platform with
additional processors or additional machines in a cluster. This will
increase the licensing fees and expand the scope of the
DBA's job.

Another common situation where it seems advisable to store files
in the database is in large organizations where the network
administration and the database administration are performed by
different organizations or people. The DBA can't get the net admin
to go along with the rights changes that are needed, so they just
include the file data in the database. RDBMS performance, IMHO,
is much more important than these other issues.

Alternatively, the file data can be referenced with filenames in the
RDBMS and the files themselves stored on a different file server,
thus spreading the load.

When the only tool you have is a hammer, all problems look like
nails.

Glenn Lawler
www.incodesystems.com

Reply via email to