[ 
https://issues.apache.org/jira/browse/KUDU-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo updated KUDU-1508:
-----------------------------
    Attachment: replay_container.py
                pbc_dump.txt
                filefrag.txt
                debugfs.txt

First some stats:
* Six node cluster running el6.6.
* Each node has 12 2T drives formatted as ext4 with a 2048 byte block size.
* The first node runs the Kudu master, while the rest run tservers.
* All 12 drives in the Kudu master's node were clean. I'll skip them for the 
remainder of the analysis.
* Each of the remaining five nodes has ~120,000 containers, the vast majority 
of which are full.
* In total, three machines have two corrupt inodes each, one machine has three 
corrupt inodes, and one has 12 corrupt inodes.

I focused on one of the inodes on the machine with 12 corrupt inodes. It is 
indeed a container data file. It was limited to 1353 blocks (full 'kudu pbc 
dump' output attached) as per the investigation done by commit 4923a74. Of 
those, 1078 were deleted, leaving 275 live blocks. filefrag shows that the file 
has 214 extents (full output attached), and debugfs (full output attached) 
shows one level 0 interior node and four level 1 interior nodes to support that 
extent tree.

I wrote a script (attached) to replay the container as the LBM would have 
written it. The script parses a LBM container metadata file via 
dump_all_blocks.py (see KUDU-1856), and adheres to LBM semantics in many ways 
during replay, including preallocation, hole punching (with proper alignment), 
and fdatasync. Obviously it doesn't have access to the original data so it just 
writes out strings of zeroes.

I created a 16G loopback-mounted ext4 filesystem with a 2048 byte block size 
and replayed this container in it. When unmounted and fsck'ed, I couldn't 
reproduce the corruption.

In terms of next steps, I could investigate some of the other containers to see 
if they're different from the one I arbitrarily chose. We could also decide 
that there were so few occurrences of this relative to the total number of 
blocks written that it's not an issue that warrants our attention. Or we could 
take more drastic action. Please chime in if you have thoughts.


> Log block manager triggers ext4 hole punching bug in el6
> --------------------------------------------------------
>
>                 Key: KUDU-1508
>                 URL: https://issues.apache.org/jira/browse/KUDU-1508
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Assignee: Adar Dembo
>            Priority: Blocker
>             Fix For: 1.2.0
>
>         Attachments: debugfs.txt, e9f83e4acef3405f99d01914317351ce.metadata, 
> filefrag.txt, pbc_dump.txt, replay_container.py
>
>
> I've experienced many times that when I reboot an el6 node that was running 
> Kudu tservers, fsck reports issues like:
> data6 contains a file system with errors, check forced.
> data6: Interior extent node level 0 of inode 5259348:
> Logical start 154699 does not match logical start 2623046 at next level.  
> After some investigation, I've determined that this is due to an ext4 kernel 
> bug: https://patchwork.ozlabs.org/patch/206123/
> Details in a comment to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to