> and compare that with how often one does a find on a production server vs how > long it takes to index and subsequently call up the file via a DB interface > on a busy file-centric server.
I would guess that the performance impact of a single large search would be about equal to the combined impact of a whole day of writing tiny bits of extraneous info. Let's look at RAID 5. Writing parity data continually has virtually no impact to the user experience, and returns a valuable benefit (redundancy). On the other hand, a single BIG hit of doing that writing all at once (rebuilding an array after a disk failure) has a serious and noticable impact on performance. The impact may be short lived, but it affects everyone noticably, whereas small bits here and there don't. As for the DB interface, I don't see why this would be used. just as you can "cat file >> lpt", you could use readily existing tools to interface with FS housed in a database. Reiser uses the same tools as ext2/3, for example. > > People already use workarounds (such as locate on Linux desktops or > > FindFast on legacy Windows desktops) because of the performance problems of > > our current File Systems. They are inadequate. And the amount of stored > > info is RAPIDLY increasing. > > this is a fine argument for the desktop, but not nearly as compelling on the > server for many if not most server-oriented tasks. now, new server approaches > may arise as this technology becomes available, but existing uses will remain > as they are because they basically work. Same with Arcnet, or Token Ring. They work, but if there's something better, people use it instead. (And no, I don't want to get into a discussion on the technology behind various topologies. I fully understand the benefits of Token Ring. But when I can carry 1 Gig over ethernet at the same cost as 4 meg over TR, I'm switching, collisions or not.) How much faster and more efficient would Squid be if the cache was in a database? The ACL extentions on the newer file systems are an indicator that things are passing the current abilities of the FS. We need more than what they provide now. At some point there comes a time to say lets fix the root of the problem rather than taking ext2 and adding some journaling, and adding a file-contents database (locate), and adding hacks that allow extended attributes (ACL, etc). When does it end? ACL is a pain because the old tools - chmod, chown, chgrp - don't really work anymore. getfacl and setfacl are a nuisence caused by a workaround. > btw, Oracle tried making this argument for a DB FS many years ago. no one > bought it then. Oracle was selling THEIR solution. I think ORACLE was rejected, not the idea generally. Kinda like Sun's network PC. > > Look at /proc, or /dev, etc. More is being expected of the file system, > > because it makes sense to see it as part of the file system. This will > > increase, not decrease over time. > > while /proc and /dev are generally useful and incur no real overhead to the > actual filesystem, i don't know if the same can be said for a DB FS. They're useful because they're integrated, and allow people to access something that on other OSes isn't easily available with tools that they already know. I can "cat /proc/sys/net/ipv4/ip_forward" Cat is a common tool, and normally RAM contents are not so easily accessed. This is the REASON /proc is valuable. When I used peeks and pokes on the C=64, it was easier to guess, and change something if I was wrong. Integration with the FS allows /proc to provide simplistic access to data that wouldn't be available otherwise and it allows that access by using tools that everyone is already comfortable with. That *IS* it's benefit. > i'd also expect that the more intensive the metadata handling is on the FS > level, the harder it will be to get decent performance out of older systems > and smaller disks. look at the impact of journaling, for instance. This is probably true, but I'd venture to guess that it depends on loading. as I said at the top. There is far less impact in making a continual slight hit, than there is in a single huge one. This will be particularly true on older systems. If the system runs at 60% disk I/O utilization now, and we look forward, perhaps something like reiser would run it at 75%. But that might be better than running it at 100% during the hour or so that it takes rebuild the locate database or return the results to a single "find" command. Kev.
