Hi Joydeep, Thanks for your comments. Really appreciate it.
For the Namenode configuration, please see if you can use most of the memory available on the machine. Maybe a param of -xmx7000 or so shud do it. Also, you might want to bump up the number of Namenode handler threads, dfs.namenode.handler.count. By default this is set to 10. It might make sense to set this to 40 or so. Thanks, dhruba -----Original Message----- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Thursday, September 13, 2007 2:45 PM To: [email protected] Subject: RE: ipc.client.timeout - fixed namenode to not be data/task node - 31K files right now - haven't played around with memory options - namenode still running with xmx1000m - I can bump this up (8G memory available) Btw - from what I see in code - the server is likely discarding the client call (and not performing the operation at all). Another (dumber) approach for handling the idempotency issue would be for the client to retry anyway - in most cases, the server would not have performed the operation. In the minority of the cases where the server already performed the operation - the client can report a timeout error (instead of the actual error). (ie. It's almost as if the last retry was not performed). (there could be some flaw in this logic - just can't think of one right now) -----Original Message----- From: Dhruba Borthakur [mailto:[EMAIL PROTECTED] Sent: Thursday, September 13, 2007 2:21 PM To: [email protected] Subject: RE: ipc.client.timeout We have discussed the approach of remembering completed RPCs (and there status codes, return parameters, etc) so that a retry of a previously executed RPC can get back identical results. But we have not implemented this yet. In the short term, it would be nice if you can make the Namenode run on a dedicated machine (no Datanodes, tasktrackers, etc on this machine). Also, how many files does ur cluster have and how much is the main memory on the Namenode machine? How much memory is the Namenode jvm configured to use? Thanks, dhruba -----Original Message----- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Thursday, September 13, 2007 2:16 PM To: [email protected] Subject: RE: ipc.client.timeout Learning the hard way :-) Second Ted's last mail (all the way back to Sun RPC - server can keep track of completed RPC calls and reply success to client retries if op already performed). -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, September 13, 2007 1:54 PM To: [email protected] Subject: Re: ipc.client.timeout Joydeep Sen Sarma wrote: > Quite likely it's because the namenode is also a data/task node. That doesn't sound like a "best practice"... Doug
