Let me warn you though - we got suspiciously high throughput when using "rwd", so we suspect it may not do what it is supposed to do, and crash recovery may not work as good as expected.
Very important, reliability is a top priority.
However, from what I gathered, "rwd" is preferred if you want to guarantee the data reliability and are willing to delay (risk) the (non-essential) metadata reliability.
As an aside, metadata appears to be the directory entry information (at least) associated with the database file.
"rwd" only updates essential metadata. Essential directory information would be the additional inode (unix) record keeping required as a db grew in size. This metadata is required to ensure all database content is fully recoverable.
"rws" updates all metadata (even metadata non-essential to recovering the database content). Non-essential directory information would be updating the last-modified-time in the directory entry for the file containing the db. IOW, an "old" last-modified-time metadata is non-essential since it is unnecessary to recover even the latest updates to the database.
From a performance perspective, "rwd" (relative to "rws") is very compelling. In theory it could be almost twice as fast as "rws" (I think) under certain situations (many database modification transactions). Basically, "rws" must modify not only the db file but also its directory entry (e.g., the last-modified-time) for all database transactions. In contrast, "rwd" just updates the db file (content) and is done. So "rwd" updates just one file whereas "rws" must update two files per transaction. In theory, "rwd" could be roughly twice as fast as "rws" (I believe).
That said, I think it's important to judge "rwd" support on a platform by platform basis to verify it works as expected where desired. Also, I wonder if it works at least as well as "rws" everywhere? I gather the "serious" (e.g., Oracle) databases would use "rwd" instead of "rws" (for performance reasons) so one indication of a platform's "rwd" reliability may be whether Oracle runs on it (guessing).
Another question for Derby is whether it has the luxury (as does, e.g., Oracle and perhaps Cloudscape) of limiting its use to certain platforms. How does that work with open source databases, or Derby in particular? IOW, requiring proper "rwd"/"rws" support may be too stringent, but it does beg the question of whether there's any platform of sufficient interest which does not properly support "rwd".
-=-
On a more grounded note, my "rwd" patch is a simple enough patch, that it could be configurable as well to flip from "rwd" to "rws" as desired. However, I'm not sure platforms without "rwd" would necessarily be any better at supporting "rws" (since "rwd" is really less demanding and more interesting than "rws"). I suspect the commercial approach was simply not to support platforms without full "rwd" support, but how does that translate to Apache Derby?
I'm not sure what happens as file systems get more sophisticated, i.e., with more metadata than just directory entries to update or even create. I'm suspicious of this with OSX since it has a variety of metadata files which are unique to OSX. I want to test what's happening there and if that's what's causing the schizophrenic file-exists/file-not-found issue under certain database open circumstances using "rws". However, this issue may be moot (at least on OSX) if "rwd" is really the preferred method anyway _and_ has proper support where desired (hmm, this "rwd" feature could make a good candidate for a low-level unit test to verify "rwd"/"rws" database integrity on target platforms!).
All that said, I'd appreciate if those with more experience in db's and file system's could chime in if they have any corrections, additions, etc..
Thanks,
= Joe =
