Mike Percy has posted comments on this change. Change subject: KUDU-1377: Ignore corrupted trailing records on LBM metadata ......................................................................
Patch Set 3: (25 comments) http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 759: // The idea behind this test is that we should tolerate partial records at the > Nit: we tolerate only _one_ partial record; the last one. A partial non-las Done Line 761: // metdata files. A system crash or disk-full event can result in > I wish we had more precision w.r.t. the kinds of failures that could result Done Line 767: // TODO: Create block, then truncate metadata file, then start the thing up, > Why is this a TODO? Removed Line 784: gscoped_ptr<ReadableBlock> block; > Why read the last block? What does that test? To prove that we can open the block. It fails later. Line 788: // Reopen the block manager to prove that we can still append to a container > Hmm, are you sure this isn't already tested by another test here, or by blo Removed Line 809: LOG(INFO) << path; > Is this still useful? Removed Line 810: string metadata_path = path + ".metadata"; > Would be good to consolidate this in a constant within LogBlockManager, so Done Line 819: int ret = truncate(metadata_path.c_str(), meta_size + 1); > How would you feel about opening metadata_path as an RWFile, then relying o Done Line 821: PLOG(ERROR) << "truncate() of file " << metadata_path << " failed"; > Can you PLOG(FATAL) and thus avoid the FAIL() below? Moot point with the RWFile Line 826: // our blocks. > Should add that the container will not be reused. Done http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 297: mutable bool append_disabled_; > Why does this need to be mutable? Where are we modifying it in a const meth In LogBlockContainer::ReadContainerRecords() Line 443: if (PREDICT_TRUE(read_status.IsEndOfFile() || last_record_is_partial)) { > This is another reason why ValidateAndRead() should return EOF in the parti See my other comment about this. let's leave the API as-is. Line 447: append_disabled_ = true; > Should probably LOG(WARN) something too. Done http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: Line 271: // Return the path of the given container. > Nit: add ForTests() or some such? It'd be brittle for real code to rely on That's why I made it private, but this doesn't hurt. Done http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util-test.cc File src/kudu/util/pb_util-test.cc: Line 249: { > Nit: why the new scope? The other test cases don't do that. It creates variables that are not reusable (pb_file). Better to keep the test scope a little cleaner. Line 254: int ret = truncate(path_.c_str(), known_good_size - 2); > Again, consider adding Truncate() as a method to RWFile, and opening that h Done Line 256: PLOG(ERROR) << "truncate() of file " << path_ << " failed"; > Combine the if statement with PLOG_IF(FATAL, ret != 0) here. Moot point w/ RWFile::Truncate() Line 266: for (int i = 0; i < 2; i++) { > Why the loop? To verify that the response is repeatable. I'll add a comment. http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.cc File src/kudu/util/pb_util.cc: Line 458: // Avoid returning Status::EndOfFile to caller since it may be confusing. > How so? At the moment, the LBM can't recover from any non-OK status returne Well, this was a suggestion you made on the previous rev, but I've now reverted it (it's simpler not to coerce it to Corruption). Line 476: return Status::EndOfFile("Reached end of file"); > This case doesn't set is_partial_record, and it should set it to true, righ No, there is no partial record. It is truly the end of the file. I've added an assert for this. http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.h File src/kudu/util/pb_util.h: Line 233: enum EofOK { > Now unused? Thanks for the catch. Line 249: // Return values: > Rephrase: Rephrased Line 282: // If 'file_size' < 'offset_' + 'length' then this function will not > Let's make this more precise: we avoid incrementing offset_ in the case of Done Line 285: Status ValidateAndRead(size_t length, uint64_t file_size, > On second thought, I don't think you need is_partial_record here at all. If Yeah but the API is clunkier and ReadNextPB() is uglier. Line 287: bool* is_partial_record); > is_partial_record isn't a great name here, because ValidateAndRead() doesn' renamed to is_partial_read -- To view, visit http://gerrit.cloudera.org:8080/2595 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b88ba051b390b6a2587eecdd205c478f1edccb4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-HasComments: Yes
