[ 
https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516456#comment-16516456
 ] 

Mike Percy commented on KUDU-2260:
----------------------------------

[~wdberkeley] looked into this a bit today after this appeared again in the 
wild and found [this 
thread|https://plus.google.com/+KentonVarda/posts/JDwHfAiLGNQ] where Ted T'so 
discusses this situation and notes that ext4 may flush the file size before the 
data makes it to disk. The one guarantee you get is that when that happens you 
will read NULL bytes at the end of the file (instead of some garbage data). So 
it seems like we should look for trailing NULL records at the end of these 
files and ignore them when opening log block containers.

One thing that wasn't clear from my reading of that thread is whether the 
writes need to be sector-aligned to avoid torn writes or whether the filesystem 
will avoid crossing a sector boundary in all cases for a single write that is 
less than sector bytes long.

> Log block manager should handle null bytes in metadata on crash
> ---------------------------------------------------------------
>
>                 Key: KUDU-2260
>                 URL: https://issues.apache.org/jira/browse/KUDU-2260
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>            Reporter: Mike Percy
>            Priority: Major
>
> The log block manager currently may leave null bytes at the end of the 
> metadata log file if there is a system crash in the middle of a write. The 
> log block manager should detect null bytes at the end of a metadata entry on 
> startup and potentially truncate the entry or close the container.
> Currently, it prints an error along the following lines:
> {code}
> F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() 
> Bad status: Corruption: Failed to load FS layout: Could not read records from 
> container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length 
> checksum does not match: Incorrect checksum in file 
> /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: 
> Checksum does not match. Expected: 0. Actual: 1323915147
> {code}
> At the time of writing, the workaround for this issue is to truncate the 
> affected file at the start of the incomplete entry in the file. While this 
> may leave orphaned blocks, this should be safe because if the metadata entry 
> was never successfully written then it should not have been considered 
> durable, either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to