On Thu, Jul 12, 2012 at 11:20 PM, N Keywal <[email protected]> wrote:
> I looked in details at the hdfs configuration parameters and their
> impacts. We have the calculated values:
> heartbeat.interval = 3s ("dfs.heartbeat.interval").
> heartbeat.recheck.interval = 300s ("heartbeat.recheck.interval")
> heartbeatExpireInterval = 2 * 300 + 10 * 3 = 630s => 10.30 minutes
>

...

> connect/read:  (3s (hardcoded) * NumberOfReplica) + 60s ("dfs.socket.timeout")
> write: (5s (hardcoded) * NumberOfReplica) + 480s  
> ("dfs.datanode.socket.write.timeout")
>
> That will set a 69s timeout to get a "connect" error with the default config.
>

Adding this list of configs to the manual in a table would be
generally useful I think (these and the lease ones below), especially
if had a note on what happens if you change the configs.

The 69s timeout is a bit rough espeically if a read on another open
file already figured the DN dead; ditto on the write.

> On paper, it would be great to set "dfs.socket.timeout" to a minimal
> value during a log split, as we know we will get a dead DN 33% of the
> time. It may be more complicated in real life as the connections are
> shared per process. And we could still have the issue with the
> ipc.Client.
>

Seems like we read the DFSClient.this.socketTimeout opening
connections to blocks.

> As a conclusion, I think it could be interesting to have a third
> status for DN in HDFS: between live and dead as today, we could have
> "sick". We would have:
> 1) Dead, known as such => As today: Start to replicate the blocks to
> other nodes. You enter this state after 10 minutes. We could even wait
> more.
> 2) Likely to be dead: don't propose it for write blocks, put it with a
> lower priority for read blocks. We would enter this state in two
> conditions:
>   2.1) No heartbeat for 30 seconds (configurable of course). As there
> is an existing heartbeat of 3 seconds, we could even be more
> aggressive here.
>   2.2) We could have a shutdown hook in hdfs such as when a DN dies
> 'properly' it says to the NN, and the NN can put it in this 'half dead
> state'.
>   => In all cases, the node stays in the second state until the 10.30
> timeout is reached or until a heartbeat is received.

I suppose as Todd suggests, we could do this client side.  The extra
state would complicate NN (making it difficult to get such a change
in).  The API to mark a DN dead seems like a nice-to-have.   Master or
client could pull on it when it knows a server dead (not just the RS).

St.Ack

Reply via email to