Ahhh.   The parallel processing in  <<  common-src/bsdtcp-security.c,   
function  runbsdtcp  >>
*ALSO*  does not call asynchronously,   and is ALSO  giving me a 3 minute   TCP 
  wait
if the node is actually down.    Both bits of code have comments saying they 
are calling
asynchronously,   but neither one is doing so yet.
    I understand — it’s an effort to code that!   I’ve only gotten as far as 
realizing that both
bsdtcp  and  krb5   go through this same spot in the code.  << 
common-src/util.c,  function connect_port  >>
 I *had*  already realized that both bsdtcp  and  krb5  clients  were giving me 
3 minute TCP  waits   
(later,  times the   “connect-tries” parameter in my config, which defaults to 
3).   I did try setting that to 1
and lost all my other (up)  clients anyway.    It seems to be the first wait  
that bothers everybody else.

So —  TCP nodes  (which means bsdtcp  and krb5  and maybe others,  but not  bsd 
)
have <system default> wait times  which defaults to 3 minutes 9 seconds
times  “connect-tries”   if a node is offline.

If I didn’t have any bsd  clients  (udp connections)   would this not be 
bothering me?
Do you other people with only  bsdtcp   and/or  krb5  clients     have no 
problems if a node is
offline?

     (If so,  then  I’ll push to upgrade those older clients,   instead of 
trying to re-write the code!)

Deb Baddorf
Fermilab  


On Sep 23, 2014, at 10:56 AM, Jean-Louis Martineau <[email protected]> wrote:

> Debra,
> 
> The patch created other problems and was later reverted (I don't remember 
> what was the problem).
> 
> You can try it if you want
> In common-src/krb5-security.c,function runkrb5,
> replace the last argument (0) of stream_client to 1.
> 
> Jean-Louis
> 
> On 09/22/2014 04:19 PM, Debra S Baddorf wrote:
>> I seem to recall a patch for this, but I can’t find it now.   It’s finally 
>> happened firmly enough that I can reproduce it, on amcheck at least:
>> 
>> amcheck  (and sometimes amdump) hang indefinitely is the client is powered 
>> down.
>> 
>> amanda v3.3.6  server    (happened on 3.3.3 too so I upgraded, but it still 
>> happens)
>> 2 clients - are powered down so version cannot matter!
>> both are (were) using  auth=krb5
>> other krb5 clients work fine  and have done so for more than a year
>> 
>> The server’s log doesn’t have anything useful —  I killed the process at 
>> about 15:10 .
>> 
>> 
>> server/daily/amcheck.20140922150805.debug
>> ::::::::::::::
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck: pid 3893 ruid 0 euid 11 
>> version 3.3.6: start at Mon Sep 22 15:08:05 2014
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck: pid 3893 ruid 0 euid 11 
>> version 3.3.6: rename at Mon Sep 22 15:08:05 2014
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck-clients: 
>> security_getdriver(name=krb5) returns 0x193240
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck-clients: 
>> security_handleinit(handle=0x96251c8, driver=0x193240 (KRB5))
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck-clients: 
>> security_streaminit(stream=0x9625750, driver=0x193240 (KRB5))
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck-clients: make_socket 
>> opening socket with family 2
>> Mon Sep 22 15:08:05 2014: thd-0x9550330: amcheck-clients: connect_port: Try  
>> port 50000: available - Success
>> 
>> Can somebody point me to the patch I remember hearing about?
>> 
>> Deb Baddorf
>> Fermilab
> 


Reply via email to