> and compare that with how often one does a find on a production server vs
how
> long it takes to index and subsequently call up the file via a DB
interface
> on a busy file-centric server.

I would guess that the performance impact of a single large search would be
about equal to the combined impact of a whole day of writing tiny bits of
extraneous info.  Let's look at RAID 5.  Writing parity data continually has
virtually no impact to the user experience, and returns a valuable benefit
(redundancy).  On the other hand, a single BIG hit of doing that writing all
at once (rebuilding an array after a disk failure) has a serious and
noticable impact on performance.  The impact may be short lived, but it
affects everyone noticably, whereas small bits here and there don't.

As for the DB interface, I don't see why this would be used.  just as you
can "cat file >> lpt", you could use readily existing tools to interface
with FS housed in a database.  Reiser uses the same tools as ext2/3, for
example.

> > People already use workarounds (such as locate on Linux desktops or
> > FindFast on legacy Windows desktops) because of the performance problems
of
> > our current File Systems.  They are inadequate.  And the amount of
stored
> > info is RAPIDLY increasing.
>
> this is a fine argument for the desktop, but not nearly as compelling on
the
> server for many if not most server-oriented tasks. now, new server
approaches
> may arise as this technology becomes available, but existing uses will
remain
> as they are because they basically work.

Same with Arcnet, or Token Ring.  They work, but if there's something
better, people use it instead.  (And no, I don't want to get into a
discussion on the technology behind various topologies.  I fully understand
the benefits of Token Ring.  But when I can carry 1 Gig over ethernet at the
same cost as 4 meg over TR, I'm switching, collisions or not.)

How much faster and more efficient would Squid be if the cache was in a
database?

The ACL extentions on the newer file systems are an indicator that things
are passing the current abilities of the FS.  We need more than what they
provide now.  At some point there comes a time to say lets fix the root of
the problem rather than taking ext2 and adding some journaling, and adding a
file-contents database (locate), and adding hacks that allow extended
attributes (ACL, etc).  When does it end?  ACL is a pain because the old
tools - chmod, chown, chgrp - don't really work anymore.  getfacl and
setfacl are a nuisence caused by a workaround.

> btw, Oracle tried making this argument for a DB FS many years ago. no one
> bought it then.

Oracle was selling THEIR solution.  I think ORACLE was rejected, not the
idea generally.  Kinda like Sun's network PC.

> > Look at /proc, or /dev, etc.  More is being expected of the file system,
> > because it makes sense to see it as part of the file system.  This will
> > increase, not decrease over time.
>
> while /proc and /dev are generally useful and incur no real overhead to
the
> actual filesystem, i don't know if the same can be said for a DB FS.

They're useful because they're integrated, and allow people to access
something that on other OSes isn't easily available with tools that they
already know.  I can "cat /proc/sys/net/ipv4/ip_forward"  Cat is a common
tool, and normally RAM contents are not so easily accessed.  This is the
REASON /proc is valuable.  When I used peeks and pokes on the C=64, it was
easier to guess, and change something if I was wrong.  Integration with the
FS allows /proc to provide simplistic access to data that wouldn't be
available otherwise and it allows that access by using tools that everyone
is already comfortable with.  That *IS* it's benefit.

> i'd also expect that the more intensive the metadata handling is on the FS
> level, the harder it will be to get decent performance out of older
systems
> and smaller disks. look at the impact of journaling, for instance.

This is probably true, but I'd venture to guess that it depends on loading.
as I said at the top.  There is far less impact in making a continual slight
hit, than there is in a single huge one.  This will be particularly true on
older systems.  If the system runs at 60% disk I/O utilization now, and we
look forward, perhaps something like reiser would run it at 75%.  But that
might be better than running it at 100% during the hour or so that it takes
rebuild the locate database or return the results to a single "find"
command.

Kev.

Reply via email to