Adar Dembo has posted comments on this change. Change subject: KUDU-1377: Ignore corrupted trailing records on LBM metadata ......................................................................
Patch Set 3: (25 comments) http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 759: // The idea behind this test is that we should tolerate partial records at the Nit: we tolerate only _one_ partial record; the last one. A partial non-last record is, by definition, a corrupted record. Because this is an important distinction, I suggest you write this entire paragraph in the singular, not the plural (i.e. one container, one partial record, etc.), to avoid ambiguity. Line 761: // metdata files. A system crash or disk-full event can result in I wish we had more precision w.r.t. the kinds of failures that could result from these crashes. My gut says system crashes could yield truncated writes, but only across filesystem block boundaries, and since block records are less than 4096 bytes, I don't think we'd see a partial record there. But, clearly running out of disk space does produce such failures. The worst part is that the exact behavior depends on the filesystem in question, and even the version of the kernel. Anyway, I think the only reasonable thing we can do is to more explicitly state our lack of information. That's not necessarily an issue here, just something to keep in mind for generally documenting issues like this. Also nit: metadata Line 767: // TODO: Create block, then truncate metadata file, then start the thing up, Why is this a TODO? Line 784: gscoped_ptr<ReadableBlock> block; Why read the last block? What does that test? Line 788: // Reopen the block manager to prove that we can still append to a container Hmm, are you sure this isn't already tested by another test here, or by block_manager-stress-test? Line 809: LOG(INFO) << path; Is this still useful? Line 810: string metadata_path = path + ".metadata"; Would be good to consolidate this in a constant within LogBlockManager, so that this doesn't become desynchronized. Line 819: int ret = truncate(metadata_path.c_str(), meta_size + 1); How would you feel about opening metadata_path as an RWFile, then relying on Preallocate()/PunchHole() for truncation? They're insufficient; you'd probably need an explicit Truncate() method to shorten a file, but the abstraction does make sense. Line 821: PLOG(ERROR) << "truncate() of file " << metadata_path << " failed"; Can you PLOG(FATAL) and thus avoid the FAIL() below? Could do: PLOG_IF(FATAL, ret != 0) << "..." Below too. Line 826: // our blocks. Should add that the container will not be reused. Oh, I see you said that below. Okay, but would still be good to say it here, as I found the ASSERT_EQ(0, available_containers) to be surprising at first. http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 297: mutable bool append_disabled_; Why does this need to be mutable? Where are we modifying it in a const method? Line 443: if (PREDICT_TRUE(read_status.IsEndOfFile() || last_record_is_partial)) { This is another reason why ValidateAndRead() should return EOF in the partial record case. The contract for ReadNextPB() should be simpler than it is right now: - If you got EOF, you know you're at the end of the file. The only remaining question is whether you've got a dangling record at the end or not, which determines whether you disable append. - If you got Corruption, you've got a problem and can't recover. Line 447: append_disabled_ = true; Should probably LOG(WARN) something too. http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: Line 271: // Return the path of the given container. Nit: add ForTests() or some such? It'd be brittle for real code to rely on LogBlockContainer::ToString() providing the container path. http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util-test.cc File src/kudu/util/pb_util-test.cc: Line 249: { Nit: why the new scope? The other test cases don't do that. Line 254: int ret = truncate(path_.c_str(), known_good_size - 2); Again, consider adding Truncate() as a method to RWFile, and opening that here, to retain the env.h abstraction. Line 256: PLOG(ERROR) << "truncate() of file " << path_ << " failed"; Combine the if statement with PLOG_IF(FATAL, ret != 0) here. Line 266: for (int i = 0; i < 2; i++) { Why the loop? http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.cc File src/kudu/util/pb_util.cc: Line 458: // Avoid returning Status::EndOfFile to caller since it may be confusing. How so? At the moment, the LBM can't recover from any non-OK status returned from ReadablePBContainerFile::Init(), so it doesn't matter whether it's EndOfFile or Corruption. In the future, it could recover from an EndOfFile by throwing away the container, which we've determined to be either completely empty or with an incomplete header (and thus empty for purposes of storing blocks). Line 476: return Status::EndOfFile("Reached end of file"); This case doesn't set is_partial_record, and it should set it to true, right? Is this case not being covered by a test? http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.h File src/kudu/util/pb_util.h: Line 233: enum EofOK { Now unused? Line 249: // Return values: Rephrase: Return values: If this was the last complete record in the file, returns Status::EndOfFile. If there was extra data (i.e. partial or corrupt record) following this record, is_partial_record is set to true; otherwise it is false. If a partial or corrupt record is encountered mid-file, returns Status::Corruption. Line 282: // If 'file_size' < 'offset_' + 'length' then this function will not Let's make this more precise: we avoid incrementing offset_ in the case of an error, be it an off-the-end-of-the-file read or a short read. Meaning, say that offset_ is only incremented in the event of a successful read. Line 285: Status ValidateAndRead(size_t length, uint64_t file_size, On second thought, I don't think you need is_partial_record here at all. If you returned Status::EndOfFile when is_partial_record=true, you'd be communicating the same information with fewer variables, right? This isn't true for ReadNextPB() (where you need to distinguish between "end of file, clean" and "end of file with a little bit of data dangling at the end"), but no such distinguishing is needed here, evidenced by the fact that the new implementation of ValidateAndRead() doesn't ever return Status::EndOfFile. Line 287: bool* is_partial_record); is_partial_record isn't a great name here, because ValidateAndRead() doesn't read records, it reads whatever chunks of data other parts of the class have asked for. Maybe something like ' -- To view, visit http://gerrit.cloudera.org:8080/2595 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b88ba051b390b6a2587eecdd205c478f1edccb4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
