The more I look into it, the more it gets weird.

Gavin McCullagh schrieb:
> On Wed, 27 Jan 2010, Dirk H. Schulz wrote:
>
>   
>> Telnetting from external-fd to server-sd using the above mentionened FQDN
>> and the port of the storage daemon (telnet storage.server.sd 9103)
>> outputs exactly the same as telnetting internally to that port.  Afaik,
>> that means: bacula-fd on the external client should be able to connect to
>> bacula-sd on the internal server.
>>
>> But it does not. Running a backup job for this client the director is 
>> quite a long time "waiting for Client ... to connect to Storage ..." and 
>> eventually gives up.
>>     
>
> In this instance, I would be inclined to start a tcpdump like that below on
> both the -fd and -sd, start your backup and see where exactly the -fd tries
> to connect to.
>       tcpdump -ni ethX tcp port 9103
>
> The first question I suppose is to see what IP address the -fd is actually
> using to connect.  The second is does the tcp handshake happen correctly
> and if so what happens then.  Perhaps the -fd is connecting to the wrong
> IP, or it could be a firewall issue, or something else...?  
>   
First: I made the test with all firewalls on the way shut down (except 
the one doing NAT) to avoid any issues from there.
Then I made a similar test with a different client-fd in the same public 
subnet, and it worked.
I have thoroughly compared the configuration of these two clients (both 
bacula-fd.conf and bacula-dir.conf).

Still nothing works. And here is what tcpdump and bacula-dir output:

> external-fd:~ root# tcpdump -ni en1 portrange 9101-9103
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on en1, link-type EN10MB (Ethernet), capture size 96 bytes
> 08:01:39.346580 IP 1.2.3.4.32930 > 40.50.60.70.9102: S 
> 1415949915:1415949915(0) win 5840 <mss 1452,sackOK,timestamp 939681199 
> 0,nop,wscale 7>
> 08:01:39.346647 IP 40.50.60.70.9102 > 1.2.3.4.32930: S 
> 221080258:221080258(0) ack 1415949916 win 65535 <mss 1460,nop,wscale 
> 3,nop,nop,timestamp 213499348 939681199,sackOK,eol>
> 08:01:39.387055 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 
> <nop,nop,timestamp 939681241 213499348>
> 08:01:39.387073 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 1 win 65535 
> <nop,nop,timestamp 213499348 939681241>
> 08:01:39.391051 IP 1.2.3.4.32930 > 40.50.60.70.9102: P 1:51(50) ack 1 
> win 46 <nop,nop,timestamp 939681244 213499348>
> 08:01:39.391065 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 
> 65535 <nop,nop,timestamp 213499348 939681244>
> 08:06:22.221818 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 0
> 08:06:22.221853 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 
> 65535 <nop,nop,timestamp 213502176 939681244>
> 08:06:22.262232 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 
> <nop,nop,timestamp 939964161 213499348>
> 08:11:07.236737 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 0
> 08:11:07.236780 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 
> 65535 <nop,nop,timestamp 213505026 939964161>
> 08:11:07.279418 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 
> <nop,nop,timestamp 940249226 213502176>
> 08:11:44.501513 IP 1.2.3.4.32930 > 40.50.60.70.9102: F 51:51(0) ack 1 
> win 46 <nop,nop,timestamp 940286454 213505026>
> 08:11:44.501542 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 52 win 
> 65535 <nop,nop,timestamp 213505399 940286454>
All the while bacula-dir claims "waiting for Client external-fd to 
connect to Storage LTO2" there is not one attempt at connecting to SD 
from this client!

And in the end the error message from bacula-dir is something different:

> 8-Jan 08:11 bacula-dir JobId 33: Fatal error: Unable to authenticate 
> with File daemon at "external-fd.domain.de:9102". Possible causes:
> Passwords or names not the same or
> Maximum Concurrent Jobs exceeded on the FD or
> FD networking messed up (restart daemon).
> Please see 
> http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000
>  
> for help.
> 28-Jan 08:11 bacula-dir JobId 33: Fatal error: Network error with FD 
> during Backup: ERR=Unterbrechung während des Betriebssystemaufrufs
> 28-Jan 08:11 bacula-dir JobId 33: Fatal error: No Job status returned 
> from FD.
> 28-Jan 08:11 bacula-dir JobId 33: Error: Bacula bacula-dir 3.0.3 
> (18Oct09): 28-J
I have even tried without any passwords, I have copied and pasted the 
client name everywhere to make sure there is no typo in there.

And then - just from pure desperation - I started it bacula-fd manually 
instead of via launchd (with the same parameters launchd is given) - and 
now it works!

Somehow communication does not work correctly if bacula-fd is started 
via launchd (/sbin/bacula-fd -f -c /etc/bacula/bacula-fd.conf).

Anyone seen that before? Any workaround for that? It is MacOS X Client 
10.5.5 Intel (uname -a outputs "Darwin external-fd.domain.de 9.5.0 
Darwin Kernel Version 9.5.0: Wed Sep  3 11:29:43 PDT 2008; 
root:xnu-1228.7.58~1/RELEASE_I386 i386").

Any help or hint would be greatly appreciated!

Dirk



------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to