- fixed namenode to not be data/task node
- 31K files right now
- haven't played around with memory options - namenode still running
with xmx1000m - I can bump this up (8G memory available)

Btw - from what I see in code - the server is likely discarding the
client call (and not performing the operation at all). Another (dumber)
approach for handling the idempotency issue would be for the client to
retry anyway - in most cases, the server would not have performed the
operation. In the minority of the cases where the server already
performed the operation - the client can report a timeout error (instead
of the actual error). (ie. It's almost as if the last retry was not
performed). (there could be some flaw in this logic - just can't think
of one right now)

-----Original Message-----
From: Dhruba Borthakur [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 13, 2007 2:21 PM
To: [email protected]
Subject: RE: ipc.client.timeout

We have discussed the approach of remembering completed RPCs (and there
status codes, return parameters, etc) so that a retry of a previously
executed RPC can get back identical results. But we have not implemented
this yet.

In the short term, it would be nice if you can make the Namenode run on
a
dedicated machine (no Datanodes, tasktrackers, etc on this machine).
Also,
how many files does ur cluster have and how much is the main memory on
the
Namenode machine? How much memory is the Namenode jvm configured to use?

Thanks,
dhruba


-----Original Message-----
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 13, 2007 2:16 PM
To: [email protected]
Subject: RE: ipc.client.timeout

Learning the hard way :-)

Second Ted's last mail (all the way back to Sun RPC - server can keep
track of completed RPC calls and reply success to client retries if op
already performed). 

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 13, 2007 1:54 PM
To: [email protected]
Subject: Re: ipc.client.timeout

Joydeep Sen Sarma wrote:
> Quite likely it's because the namenode is also a data/task node. 

That doesn't sound like a "best practice"...

Doug

Reply via email to