When I think about power-down situations I've always been scared to death of pulling the plug on a normal hard drive that might be writing (for good reason), but SSDs are an entirely different matter.
Even when testing HAMMER I don't pull power on a real HD. I do it with a USB stick (Hmm I should probably buy a SSD now that capacities are getting reasonable and do depowering tests with it). Testing HAMMER on a normal HD involves only pulling the SATA port :-). Pulling USB sticks and HD SATA ports with a live-mounted filesystem doing heavy writing is rather fun. Even though USB doesn't generally support media sync those cheap sticks tend to serialize writes anyway (having no real cache on-stick) so it is a reasonable simulation. I don't know if gjournal is filesystem-aware. One of the major issues with softupdates is that there is no demarkation point that you can definitively rollback to which guarantees a clean fsck. On the otherhand, even though a journal cannot create proper bulk barriers without being filesystem-aware the journal can still enforce serialization of write I/O (from a post-recovery rollback standpoint), and that would certainly make a big difference with regards to fsck not choking on misordered data. Scott mentioned barriers vs BIO_FLUSH. I was already assuming that Jeff's journaling code at least used barriers (i.e. waits for all prior I/O to complete before issuing dependent I/O). That is mandatory since both NCQ on the device and (potentially) bioqdisksort() (if it is still being used) will reorder write BIOs in-progress. In an environment where a very high volume of writes is being pipelined into a hard drive the hard drive's own ram cache will start stalling the write BIOs and there will be a continuous *SEVERE* reordering of the data as it gets committed to the media. BIO_FLUSH is the only safe way to deal with that situation. I strongly believe that the use of BIO_FLUSH is mandatory for any meta-data updates. One can optimize the write()+fsync() path as a separate journal-only intent log entry which does not require a BIO_FLUSH (as it would not involve any meta-data updates at all to satisfy the fsync() requirements), and continue to use proper BIO_FLUSH semantics for the actual softupdates-related updates. By default, in HAMMER, fsync() will use BIO_FLUSH anyway, but I'm working on a more relaxed feature this week which does precisely what I described.... writes small amounts of write() data directly into the REDO log to satisfy the fsync() requirements and then worries about the actual data/meta-data updates later. The BIO_FLUSH for *JUST* that logical log entry then becomes optional. -- I see another issue with the SUJ stuff though it is minor in comparison to the others. It is not a good idea to depend on a CRC to validate log records in the nominal recovery case. That is, the CRC should only be used to detect hard failure cases such as actual data corruption. What I do with HAMMER's UNDO/REDO log is place a log header with a sequence number on every single 512 byte boundary, as well as preformat the log area to guarantee that all lost sector writes are detected and to guarantee that no stale data will ever be misinterpreted as a log entry, without having to depend on the CRC (which I also have, of course). Large UNDO/REDO records are broken down into smaller pieces as necesary so as not to cross a 512-byte boundary. If one does not do this then the nominal failure with misordered writes can lay down the log header in one sector but fail to have laid down the rest of the log record in one or more other sectors that the record covers. One must then rely on the CRC to detect the case which is dangerous because any mis-parsed information in the journal can destroy the filesystem faster then normal corruption would, and the prior contents of the log being overwritten will be partially valid or patterned and might defeat the CRC. The sequence number check is only sufficient for scan-detecting the span of the log if the disk offset in question only EVER contains log headers and not log bodies. i.e. is either aligned or part of an atomic sector (aka 512 bytes on-media) and can never contain data at that particular offset. From the code documentation the jsegrec (overall) structure appears to be able to span multiple disk sectors. -Matt