[ http://issues.apache.org/jira/browse/DERBY-96?page=comments#action_59041 ] Suresh Thalamati commented on DERBY-96: ---------------------------------------
Conclusion was to solve this problem by writing a checksum log record before writing the log buffer and verify the checksum during recovery. I don't know how to link derby dev list e-mail to zira. just doing copy/paste of comments from e-mail list. Mike Matrigali wrote: >>I think that some fix to this issue should be implemented for the next >>release. The order of my preference is #2, #1, #3. >> >> I believe option #2 (checksuming log recods in the log buffers before writing to the disk) is a good fix for this problem. If there are no objectiions to this approach, I will start to work on this. -suresht >>I think that the option #2 can be implemented in the logging system and >>require very little if no changes to the rest of the system processing >>of log records. Log record offsets remain efficient, ie. they can use >>LSN's directly. Only the boot time recovery code need look for the >>new log record and do the work to verify checksums, online abort is >>unaffected. >> >>I would like to see some performance numbers on the checksum overhead >>and if it is measurable then maybe some discussion on checksum choice. >>An obvious first choice would seem to be the standard java provided one >>used on the data pages. If I had it to do over, I would probably have >>used a different approach on the data pages. The point of the checksum >>on the data page is not to catch data sector write errors, the system >>expects the device to catch those, the only point is to catch >>inconsistent sector writes (ie. 1st and 2nd 512 byte sector but not >>3rd and 4th), for this the current checksum is overkill. For this one >>need not checksum every byte on the page, >>one can guarantee a consistent write with 1 bit per sector in the page. >> >>In the future we may want to revisit #3 if it looks like the stream log >>is an I/O bottleneck which can't be addressed by striping or some other >>hardware help like smart caching controllers. I see it as a performance >>project rather than a correctness project. It also is a lot more work >>and risk. Note that this could be a good project for someone wanting to >>do some research in this area as it is implemented as a derby module >>where an alternate implementation could be dropped in if available. >> >>While I believe that we should address this issue, I should also note >>that in all my time working on cloudscape/derby I have never received a >>problem database (in that time any log related error would have come >>through me), that resulted from this out of order/imcomplete log >>write issue - this of course does not mean it has not happened just that >>it was not reported to us and/or did not affect the database in a >>noticable way. We have actually never seen an out of order write from >>the data pages also - we have seen a few checksum errors but all of >>those were caused by a bad disk. >> >>On the upgrade issue, it may be time to start an upgrade thread. Here >>are just some thoughts. If doing option #2, it would be nice if the >>new code could still read the old log files and then optionally >>write the new log record or not. Then if users wanted to run a >>release in a "soft" upgrade mode where they needed to be able to >>go back to the old software they could - they just would not get >>this fix. On a "hard" upgrade the software should continue to read >>the old log files as they are currently formatted, and for any new >>log files it should begin writing the new log record. Once the new >>log record make's it way into the log file accessing the db with the >>old software is unsupported (it will throw an error as it won't know >>what to do with the new log record). > partial log record writes that occur because of out-of order writes need to > be handled by recovery. > --------------------------------------------------------------------------------------------------- > > Key: DERBY-96 > URL: http://issues.apache.org/jira/browse/DERBY-96 > Project: Derby > Type: New Feature > Components: Store > Versions: 10.0.2.1 > Reporter: Suresh Thalamati > Assignee: Suresh Thalamati > > Incomplete log record write that occurs because of > an out of order partial writes gets recognized as complete during > recovery if the first sector and last sector happens to get written. > Current system recognizes incompletely written log records by checking > the length of the record that is stored in the beginning and end. > Format the log records are written to disk is: > +----------+-------------+------------------+ > | length | LOG RECORD | length | > +----------+-------------+------------------+ > This mechanism works fine if sectors are written in sequential manner or > log record size is less than 2 sectors. I believe on SCSI types disks > order is not necessarily sequential, SCSI disk drives may sometimes do a > reordering of the sectors to optimize the performance. If a log record > that spans multiple disk sectors is being written to SCISI type of > devices, it is possible that first and last sector written before the > crash; If this occurs recovery system will incorrectly interpret the > log records was completely written and replay the record. This could > lead to recovery errors or data corruption. > - > This problem also will not occur if a disk drive has write cache with a > battery backup which will make sure I/O request will complete. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira
