Can idempotency be retro-fitted with a client generated random key?  That
way the server can remember recent transaction keys (say the last minute of
keys) and ignore redundant requests.


On 9/13/07 1:37 PM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote:

> Hi Jaydeep,
> 
> The idea is to retry only those operations that are idempotent. addBlocks
> and mkdirs are non-idempotent, and that's why they are no retries for these
> calls. 
> 
> Can you tell me if a CPU bottleneck on your Namenode is causing you to
> encounter all these timeout?
> 
> Thanks,
> dhruba
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 13, 2007 12:14 PM
> To: [email protected]
> Subject: RE: ipc.client.timeout
> 
> I would love to use a lower timeout. It seems that retries are either
> buggy or missing in some cases - that cause lots of failures. The cases
> I can see right now (0.13.1):
> 
> - namenode.complete: looks like it retries - but may not be idempotent?
> 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
> complete write to file
> /user/facebook/profiles/binary/users_joined/_task_0018_r_000003_0/.part-
> 00003.crc by DFSClient_task_0018_r_000003_0
> at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:353)
> 
> 
> - namenode.addBlock: no retry policy (looking at DFSClient.java)
> - namenode.mkdirs: no retry policy ('')
> 
> We see plenty of all of these with a lowered timeout. With a high
> timeout - we have seen very slow recovery from some failures (jobs would
> hang on submission).
> 
> Don't understand the fs protocol well enough - any idea if these are
> fixable?
> 
> Thx,
> 
> Joydeep
> 
> -----Original Message-----
> From: Devaraj Das [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 05, 2007 1:00 AM
> To: [email protected]
> Subject: RE: ipc.client.timeout
> 
> This is to take care of cases where a particular server is too loaded to
> respond to client RPCs quick enough. Setting the timeout to a large
> value
> ensures that RPCs won't timeout that often and thereby potentially lead
> to
> lesser failures (for e.g., a map/reduce task kills itself when it fails
> to
> invoke an RPC on the tasktracker for three times in a row) and retries.
> 
>> -----Original Message-----
>> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, September 05, 2007 12:26 PM
>> To: [email protected]
>> Subject: ipc.client.timeout
>> 
>> The default is set to 60s. many of my dfs -put commands would
>> seem to hang - and lowering the timeout (to 1s)  seems to
>> have made things a whole lot better.
>> 
>>  
>> 
>> General curiosity - isn't 60s just huge for a rpc timeout? (a
>> web search indicates that nutch may be setting it to 10s -
>> and even that seems fairly large). Would love to get a
>> backgrounder on why the default is set to so large a value ..
>> 
>>  
>> 
>> Thanks,
>> 
>>  
>> 
>> Joydeep
>> 
>> 
> 
> 

Reply via email to