Dear derby-dev:

Given the derby issue on OSX of preallocate+"rws" file open()'s failing, I did some more research (google) on the issue of "rws" metadata. I don't think it's quite as mysterious as it may have at first seemed (though there are still open questions of O_SYNC interpretation on OSX and other advanced operating systems). In any case, take the following with a grain of salt and chime in if you can with additional corroboration or information.

I gather that metadata typically refers to the update of the directory information associated with the contents of the file. So, if a database gets updated, then the, e.g., "modified time", should also be updated (in the directory entry for that file) as well. If that's true, then:

"rwd" updates just the contents ensuring full data retrieval, but glosses over all but essential metadata (i.e., new block allocation metadata is handled, but directory timestamp updates are skipped).

"rws" updates not only the contents, but also non-essential metadata (directory timestamps (e.g., "modified time", "access time") et al.).

I'm not sure it's that simple (since the Java most likely relies on the underlying OS support for O_SYNC, O_DSYNC or their analogues) but at least it narrows the focus a bit. I get this impression from a variety of sites, but this URL is perhaps the clearest I found:

http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/ genprogc/fileio.htm#wq222

under "Synchronous I/O" where it mentions:

-=-
• Specified by the O_DSYNC open flag. When a file is opened using the O_DSYNC open mode, the write () system call will not return until the file data and all file system meta-data required to retrieve the file data are both written to their permanent storage locations.
• Specified by the O_SYNC open flag. In addition to items specified by O_DSYNC, O_SYNC specifies that the write () system call will not return until all file attributes relative to the I/O are written to their permanent storage locations, even if the attributes are not required to retrieve the file data.
-=-


IOW, I believe O_DSYNC should protect data integrity even if it (purposely for performance reasons) avoids updating all associated metadata. O_SYNC is good too, but (at least according to above) you pay a performance penalty. So, I think O_DSYNC may be a worthwhile substitute for O_SYNC as long as the incidental metadata is not all important.

Bottom line: a production d/b with performance goals likely uses O_DSYNC (since O_SYNC is overkill if you just need to protect the data).

-=-

I also looked a bit into OSX for O_DSYNC in search of "rws"/"rwd" insights. I downloaded the sources for Darwin (OSX's BSD underpinnings). It appears that OSX only has an O_SYNC flag. The Darwin code says that O_DSYNC is not supported yet. So, in theory, O_DSYNC should degenerate to O_SYNC.

Unfortunately, even though I was able to look at the Darwin sources, I do not have the Apple Java sources to see how the flags are treated in OSX's JVM1.4.2. (There is no JVM1.5 (Java Tiger) (pre)release yet, so I can't test against a newer version of Java (yet).)

Having said all that, the question still remains why does O_SYNC behave differently than O_DSYNC in the OSX JVM (especially since only O_SYNC exists in Darwin). I don't know. The two knee-jerk hypotheses I have are:

1. jvm:O_SYNC is using darwin:O_SYNC, but jvm:O_DSYNC is darwin:no_sync (that would be bad). So, if you need O_DSYNC, you better use O_SYNC (which fails mysteriously when file is preallocated).

_or_

2. jvm:O_DSYNC uses darwin:O_SYNC (as it should), and jvm:O_SYNC uses darwin:O_SYNC and also synchronizes OSX metadata files like .DS_Store and resource forks (and ends up taking exception under ambiguous conditions in "rws" mode).

I don't know enough about typical O_SYNC, .DS_Store, or resource forks to know the answer to this mystery.

Bottom line: I know neither whether jvm:O_DSYNC protects data on OSX/Java1.4.2 (as it should), nor why jvm:O_SYNC is any different than jvm:O_DSYNC on OSX/Java1.4.2 (especially when jvm:O_DSYNC should degenerate to darwin:O_SYNC since Darwin only has O_SYNC).

Anyway, that's probably an excess for one post. If anyone has insight to any of these questions (e.g., anyone from the OSX java team! ;-), please share.

Cheers,

= Joe =



Reply via email to