Cool. So it's at least not just us :)

cheers
--
Torsten

On 06.09.2007, at 18:57, Hairong Kuang wrote:

Hi Torsten,

We occasionally see this too. But on a small scale cluster, you are more
likely to see this. I filed a jira at
https://issues.apache.org/jira/browse/HADOOP-1845.

Cheers,
Hairong

-----Original Message-----
From: Torsten Curdt [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 06, 2007 3:25 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: still getting "is valid, and cannot be written to"

We are still seeing bunch of these. Even with a reduced submit replication. Are we the only ones seeing those? If not I'd be running off filing a bug.

cheers
--
Torsten

On 30.08.2007, at 19:47, Hairong Kuang wrote:

Namenode does not schedule a block to a datanode that is confirmed to
hold a replica of the block. But it is not aware of any in-transit
block placement (i.e. the scheduled but not confirmed block
placement), so occasionally we may still see "is valid, and cannot be
written to" errors.

A fix to the problem is to keep track of all in-transit block
placements, and the block placement algorithm considers these
to-be-confirmed replicas as well.

Hairong

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 30, 2007 10:28 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: still getting "is valid, and cannot be written to"

Raghu Angadi wrote:
Torsten Curdt wrote:
I just checked our mapred.submit.replication and it is higher than
the nodes in the cluster - maybe that's the problem?

This pretty much assures at least a few of these exceptions.

So we have a workaround: lower mapred.submit.replication.  And it's
arguably not a bug, but just a misfeature, since it only causes
spurious warnings.

One fix might be to try to determine mapred.submit.replication based
on the cluster size.  But that was contentious when that feature was
added, and I'd rather not re-open that argument again now.

You can argue that Namenode should not schedule a block to a node
twice.. and I agree.

That sounds like a good thing to fix.  Should we file a bug?

Doug




Reply via email to