> Adding this list of configs to the manual in a table would be > generally useful I think (these and the lease ones below), especially > if had a note on what happens if you change the configs. > The 69s timeout is a bit rough espeically if a read on another open > file already figured the DN dead; ditto on the write.
Aggreed. I'm currently doing that. I have as well a set of log analysis that could make it to the ref book. I will create a Jira to propose them. >> On paper, it would be great to set "dfs.socket.timeout" to a minimal >> value during a log split, as we know we will get a dead DN 33% of the >> time. It may be more complicated in real life as the connections are >> shared per process. And we could still have the issue with the >> ipc.Client. >> > > Seems like we read the DFSClient.this.socketTimeout opening > connections to blocks. > >> As a conclusion, I think it could be interesting to have a third >> status for DN in HDFS: between live and dead as today, we could have >> "sick". We would have: >> 1) Dead, known as such => As today: Start to replicate the blocks to >> other nodes. You enter this state after 10 minutes. We could even wait >> more. >> 2) Likely to be dead: don't propose it for write blocks, put it with a >> lower priority for read blocks. We would enter this state in two >> conditions: >> 2.1) No heartbeat for 30 seconds (configurable of course). As there >> is an existing heartbeat of 3 seconds, we could even be more >> aggressive here. >> 2.2) We could have a shutdown hook in hdfs such as when a DN dies >> 'properly' it says to the NN, and the NN can put it in this 'half dead >> state'. >> => In all cases, the node stays in the second state until the 10.30 >> timeout is reached or until a heartbeat is received. > > I suppose as Todd suggests, we could do this client side. The extra > state would complicate NN (making it difficult to get such a change After some iterations I came to a solution close to his proposition, mentionned in my mail from yesterday. To me we should fix this, and this includes HBASE-6401. The question is mainly on which hdfs branch hbase would need it, as HDFS code changed between the 1.0.3 release and the branch 2. HADOOP-8144 is also important for people configuring the topology imho. > in). The API to mark a DN dead seems like a nice-to-have. Master or > client could pull on it when it knows a server dead (not just the RS). Yes, there is a mechanism today to tell the NN to decommision a NN, but it's complex, we need to write a file with the 'unwanted' nodes, and we need to tell the NN to reload it. Not really a 'mark as dead" function.
