[ 
https://issues.apache.org/jira/browse/HADOOP-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657658#action_12657658
 ] 

Brian Bockelman commented on HADOOP-4866:
-----------------------------------------

Hey Nicholas,

Things didn't stabilize.

Looking at the lsof output, there were some very long-lived clients (>10 days 
old?).  Oddly enough, these clients seem to have survived cluster reboots, 
namenode restarts, the files they were writing were deleted a long time ago.  I 
killed the clients and the datanodes shut up (i.e., problem stabilized).

This particular client which uses libhdfs has had problems with infinite loops 
before (due to our problems, not libhdfs problems).  However, I'd claim that if 
the file fails, there shouldn't be any way repeated reads should cause problems.

That is, if read is called on DFSClient repeatedly even after errors, it 
shouldn't cause any issues on the DN side.

Brian

> NameNode error in commitBlockSynchronization
> --------------------------------------------
>
>                 Key: HADOOP-4866
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4866
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Brian Bockelman
>         Attachments: 4866_20081215.patch, 4866_20081216.patch, 
> 4866_20081217.patch
>
>
> The NameNode continuously has an error in the commitBlockSynchronization.  
> This happens for ~5 blocks at a rate of 5-10Hz.  I have no idea when this 
> started happening because this has been going on for days, well past the 
> start of our current logs.
> This appears to be a new symptom in 0.19.0, but I have no idea what could be 
> causing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to