Randy Kramer wrote: > On Tuesday 27 September 2005 01:44 pm, Pete Jewell wrote: > >>However, ReiserFS is *much* more efficient when you have thousands of >>files in one directory, because it uses a hashing algorithm to determine >>where the required file is (or starts) in the filesystem. This is >>something I know about (hashing) based on my experience with Pick >>database systems, which also use hashing and are incredibly fast at >>keyed record retrieval (as well as entire file/table traversal). >> > > > Thanks to Derek, Hendrik, and Pete for the replies! > > Is there a chance that the hash for a ReiserFS can become corrupted like the > index for a mbox file can be? Or maybe I should ask it differently, because > presumably something can happen to make it corrupted--does Reisers have some > better error detection / correction / recovery for the hash than is typical > of an index for an mbox file? > > (Maybe I need to go read up on Reiser, and join a Reiser list. ;-)
The beauty of a hash that is used to locate a file, or record, is that it is based on the key, or filename in the case of ReiserFS (actually, not having looked at the internals of ReiserFS I'm assuming it's the filename - it could conceivably be anything that stored against the inode). The important thing is that we have something that we can use within a mathematical expression to determine where on the filesystem the beginning of the file can be found. <quick pick lesson> In Pick the filesystem (and memory) is organised into frames. The location of a piece of data within that space can be determined mathematically, at least to a pointer that shows where the data starts. The frames are usually quite small so that there's no performance hit in loading it into memory and scanning it for the exact location within it (helped with the use of delimiters between tables, records, fields, and even fields within fields). The version of Pick I use at work (D3) uses 4K frames, but they've ranged from 512 bytes upwards in various implementations over the years. </quick pick lesson> As such, finding the file is not as susceptible to corruption (unless the inode gets trashed, same risk there), as there is no index. Time to retrieval is drastically reduced as there's no index to follow, just some integer math. However the files themselves wouldn't inherently be less susceptible to corruption - but that's where a (full) journal can help out. HTH -- PeteJ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

