[
https://issues.apache.org/jira/browse/HDFS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer resolved HDFS-1264.
------------------------------------
Resolution: Fixed
> 0.20: OOME in HDFS client made an unrecoverable HDFS block
> ----------------------------------------------------------
>
> Key: HDFS-1264
> URL: https://issues.apache.org/jira/browse/HDFS-1264
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs-client
> Affects Versions: 0.20-append
> Reporter: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: blk_logs_sorted.txt, hdfs-679-testcase-20.txt
>
>
> Ran into a bad issue in testing overnight. One of the writers experienced an
> OOME in the middle of writing a checksum chunk to the stream inside a sync()
> call. It then proceeded to retry recovery on each DN in the pipeline, but
> each recovery failed because its internal checksum buffer was borked in some
> way - on the DNs I see "Unexpected checksum mismatch" errors after each
> recovery attempt.
> When another client tried to recover the file using appendFile, it got the
> "Partial CRC 3766269197 does not match value computed the last time file was
> closed" error (plus there was only one replica left in targets). It thus
> failed to set up the append pipeline, and ran into HDFS-1262.
> This was on 0.20-append, though it may happen on trunk as well.
--
This message was sent by Atlassian JIRA
(v6.2#6252)