[ 
https://issues.apache.org/jira/browse/HDFS-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918778#comment-13918778
 ] 

Chris Nauroth commented on HDFS-5995:
-------------------------------------

bq. Why doesn't the CRC check for each opcode catch this problem?

That's because the CRC check happens after decoding the op, but it's during 
decoding that the ops do their interesting deserialization work.  This problem 
isn't specific to the ACL ops, but in the particular case here, 
{{AddCloseOp#readFields}} reads the variable-length array of ACL entries, and 
we try to allocate the large array before the CRC check.

A potentially interesting enhancement would be to serialize each op as 
<checksum><length><payload>.  Then, the first step of reading the edit log 
would be to read checksum and length followed by exactly <length> bytes into a 
payload buffer.  (Note no deserialization attempted yet.)  Then, verify the 
checksum before attempting deserialization of the payload buffer.  I don't 
believe this is a quick and easy change.  I think we'd need to touch the code 
for every op, because right now, the only thing that governs the expected 
length of an op is whenever the logic inside the op's {{readFields}} method 
decides to stop pulling bytes.

> TestFSEditLogLoader#testValidateEditLogWithCorruptBody gets OutOfMemoryError 
> and dumps heap.
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5995
>                 URL: https://issues.apache.org/jira/browse/HDFS-5995
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: namenode, test
>    Affects Versions: 3.0.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>         Attachments: HDFS-5995.1.patch
>
>
> {{TestFSEditLogLoader#testValidateEditLogWithCorruptBody}} is experiencing 
> {{OutOfMemoryError}} and dumping heap since the merge of HDFS-4685.  This 
> doesn't actually cause the test to fail, because it's a failure test that 
> corrupts an edit log intentionally.  Still, this might cause confusion if 
> someone reviews the build logs and thinks this is a more serious problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to