[ 
https://issues.apache.org/jira/browse/HDFS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-1264.
------------------------------------

    Resolution: Fixed

> 0.20: OOME in HDFS client made an unrecoverable HDFS block
> ----------------------------------------------------------
>
>                 Key: HDFS-1264
>                 URL: https://issues.apache.org/jira/browse/HDFS-1264
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>             Fix For: 0.20-append
>
>         Attachments: blk_logs_sorted.txt, hdfs-679-testcase-20.txt
>
>
> Ran into a bad issue in testing overnight. One of the writers experienced an 
> OOME in the middle of writing a checksum chunk to the stream inside a sync() 
> call. It then proceeded to retry recovery on each DN in the pipeline, but 
> each recovery failed because its internal checksum buffer was borked in some 
> way - on the DNs I see "Unexpected checksum mismatch" errors after each 
> recovery attempt.
> When another client tried to recover the file using appendFile, it got the 
> "Partial CRC 3766269197 does not match value computed the  last time file was 
> closed" error (plus there was only one replica left in targets). It thus 
> failed to set up the append pipeline, and ran into HDFS-1262.
> This was on 0.20-append, though it may happen on trunk as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to