Adar Dembo has posted comments on this change.

Change subject: KUDU-1377: Ignore corrupted trailing records on LBM metadata
......................................................................


Patch Set 3:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 759: // The idea behind this test is that we should tolerate partial 
records at the
Nit: we tolerate only _one_ partial record; the last one. A partial non-last 
record is, by definition, a corrupted record.

Because this is an important distinction, I suggest you write this entire 
paragraph in the singular, not the plural (i.e. one container, one partial 
record, etc.), to avoid ambiguity.


Line 761: // metdata files. A system crash or disk-full event can result in
I wish we had more precision w.r.t. the kinds of failures that could result 
from these crashes. My gut says system crashes could yield truncated writes, 
but only across filesystem block boundaries, and since block records are less 
than 4096 bytes, I don't think we'd see a partial record there. But, clearly 
running out of disk space does produce such failures. The worst part is that 
the exact behavior depends on the filesystem in question, and even the version 
of the kernel.

Anyway, I think the only reasonable thing we can do is to more explicitly state 
our lack of information. That's not necessarily an issue here, just something 
to keep in mind for generally documenting issues like this.

Also nit: metadata


Line 767:   // TODO: Create block, then truncate metadata file, then start the 
thing up,
Why is this a TODO?


Line 784:   gscoped_ptr<ReadableBlock> block;
Why read the last block? What does that test?


Line 788:   // Reopen the block manager to prove that we can still append to a 
container
Hmm, are you sure this isn't already tested by another test here, or by 
block_manager-stress-test?


Line 809:   LOG(INFO) << path;
Is this still useful?


Line 810:   string metadata_path = path + ".metadata";
Would be good to consolidate this in a constant within LogBlockManager, so that 
this doesn't become desynchronized.


Line 819:   int ret = truncate(metadata_path.c_str(), meta_size + 1);
How would you feel about opening metadata_path as an RWFile, then relying on 
Preallocate()/PunchHole() for truncation? They're insufficient; you'd probably 
need an explicit Truncate() method to shorten a file, but the abstraction does 
make sense.


Line 821:     PLOG(ERROR) << "truncate() of file " << metadata_path << " 
failed";
Can you PLOG(FATAL) and thus avoid the FAIL() below?

Could do:

  PLOG_IF(FATAL, ret != 0) << "..."

Below too.


Line 826:   // our blocks.
Should add that the container will not be reused.

Oh, I see you said that below. Okay, but would still be good to say it here, as 
I found the ASSERT_EQ(0, available_containers) to be surprising at first.


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 297:   mutable bool append_disabled_;
Why does this need to be mutable? Where are we modifying it in a const method?


Line 443:   if (PREDICT_TRUE(read_status.IsEndOfFile() || 
last_record_is_partial)) {
This is another reason why ValidateAndRead() should return EOF in the partial 
record case. The contract for ReadNextPB() should be simpler than it is right 
now:
- If you got EOF, you know you're at the end of the file. The only remaining 
question is whether you've got a dangling record at the end or not, which 
determines whether you disable append.
- If you got Corruption, you've got a problem and can't recover.


Line 447:       append_disabled_ = true;
Should probably LOG(WARN) something too.


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

Line 271:   // Return the path of the given container.
Nit: add ForTests() or some such? It'd be brittle for real code to rely on 
LogBlockContainer::ToString() providing the container path.


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util-test.cc
File src/kudu/util/pb_util-test.cc:

Line 249:   {
Nit: why the new scope? The other test cases don't do that.


Line 254:     int ret = truncate(path_.c_str(), known_good_size - 2);
Again, consider adding Truncate() as a method to RWFile, and opening that here, 
to retain the env.h abstraction.


Line 256:       PLOG(ERROR) << "truncate() of file " << path_ << " failed";
Combine the if statement with PLOG_IF(FATAL, ret != 0) here.


Line 266:     for (int i = 0; i < 2; i++) {
Why the loop?


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.cc
File src/kudu/util/pb_util.cc:

Line 458:     // Avoid returning Status::EndOfFile to caller since it may be 
confusing.
How so? At the moment, the LBM can't recover from any non-OK status returned 
from ReadablePBContainerFile::Init(), so it doesn't matter whether it's 
EndOfFile or Corruption.

In the future, it could recover from an EndOfFile by throwing away the 
container, which we've determined to be either completely empty or with an 
incomplete header (and thus empty for purposes of storing blocks).


Line 476:     return Status::EndOfFile("Reached end of file");
This case doesn't set is_partial_record, and it should set it to true, right? 
Is this case not being covered by a test?


http://gerrit.cloudera.org:8080/#/c/2595/3/src/kudu/util/pb_util.h
File src/kudu/util/pb_util.h:

Line 233:   enum EofOK {
Now unused?


Line 249:   // Return values:
Rephrase:

  Return values:
    If this was the last complete record in the file, returns 
Status::EndOfFile. If there was extra data (i.e. partial or corrupt record) 
following this record, is_partial_record is set to true; otherwise it is false.
    If a partial or corrupt record is encountered mid-file, returns 
Status::Corruption.


Line 282:   // If 'file_size' < 'offset_' + 'length' then this function will not
Let's make this more precise: we avoid incrementing offset_ in the case of an 
error, be it an off-the-end-of-the-file read or a short read. Meaning, say that 
offset_ is only incremented in the event of a successful read.


Line 285:   Status ValidateAndRead(size_t length, uint64_t file_size,
On second thought, I don't think you need is_partial_record here at all. If you 
returned Status::EndOfFile when is_partial_record=true, you'd be communicating 
the same information with fewer variables, right?

This isn't true for ReadNextPB() (where you need to distinguish between "end of 
file, clean" and "end of file with a little bit of data dangling at the end"), 
but no such distinguishing is needed here, evidenced by the fact that the new 
implementation of ValidateAndRead() doesn't ever return Status::EndOfFile.


Line 287:                          bool* is_partial_record);
is_partial_record isn't a great name here, because ValidateAndRead() doesn't 
read records, it reads whatever chunks of data other parts of the class have 
asked for.

Maybe something like '


-- 
To view, visit http://gerrit.cloudera.org:8080/2595
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b88ba051b390b6a2587eecdd205c478f1edccb4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

Reply via email to