The more I look into it, the more it gets weird. Gavin McCullagh schrieb: > On Wed, 27 Jan 2010, Dirk H. Schulz wrote: > > >> Telnetting from external-fd to server-sd using the above mentionened FQDN >> and the port of the storage daemon (telnet storage.server.sd 9103) >> outputs exactly the same as telnetting internally to that port. Afaik, >> that means: bacula-fd on the external client should be able to connect to >> bacula-sd on the internal server. >> >> But it does not. Running a backup job for this client the director is >> quite a long time "waiting for Client ... to connect to Storage ..." and >> eventually gives up. >> > > In this instance, I would be inclined to start a tcpdump like that below on > both the -fd and -sd, start your backup and see where exactly the -fd tries > to connect to. > tcpdump -ni ethX tcp port 9103 > > The first question I suppose is to see what IP address the -fd is actually > using to connect. The second is does the tcp handshake happen correctly > and if so what happens then. Perhaps the -fd is connecting to the wrong > IP, or it could be a firewall issue, or something else...? > First: I made the test with all firewalls on the way shut down (except the one doing NAT) to avoid any issues from there. Then I made a similar test with a different client-fd in the same public subnet, and it worked. I have thoroughly compared the configuration of these two clients (both bacula-fd.conf and bacula-dir.conf).
Still nothing works. And here is what tcpdump and bacula-dir output: > external-fd:~ root# tcpdump -ni en1 portrange 9101-9103 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on en1, link-type EN10MB (Ethernet), capture size 96 bytes > 08:01:39.346580 IP 1.2.3.4.32930 > 40.50.60.70.9102: S > 1415949915:1415949915(0) win 5840 <mss 1452,sackOK,timestamp 939681199 > 0,nop,wscale 7> > 08:01:39.346647 IP 40.50.60.70.9102 > 1.2.3.4.32930: S > 221080258:221080258(0) ack 1415949916 win 65535 <mss 1460,nop,wscale > 3,nop,nop,timestamp 213499348 939681199,sackOK,eol> > 08:01:39.387055 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 > <nop,nop,timestamp 939681241 213499348> > 08:01:39.387073 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 1 win 65535 > <nop,nop,timestamp 213499348 939681241> > 08:01:39.391051 IP 1.2.3.4.32930 > 40.50.60.70.9102: P 1:51(50) ack 1 > win 46 <nop,nop,timestamp 939681244 213499348> > 08:01:39.391065 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win > 65535 <nop,nop,timestamp 213499348 939681244> > 08:06:22.221818 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 0 > 08:06:22.221853 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win > 65535 <nop,nop,timestamp 213502176 939681244> > 08:06:22.262232 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 > <nop,nop,timestamp 939964161 213499348> > 08:11:07.236737 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win 0 > 08:11:07.236780 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 51 win > 65535 <nop,nop,timestamp 213505026 939964161> > 08:11:07.279418 IP 1.2.3.4.32930 > 40.50.60.70.9102: . ack 1 win 46 > <nop,nop,timestamp 940249226 213502176> > 08:11:44.501513 IP 1.2.3.4.32930 > 40.50.60.70.9102: F 51:51(0) ack 1 > win 46 <nop,nop,timestamp 940286454 213505026> > 08:11:44.501542 IP 40.50.60.70.9102 > 1.2.3.4.32930: . ack 52 win > 65535 <nop,nop,timestamp 213505399 940286454> All the while bacula-dir claims "waiting for Client external-fd to connect to Storage LTO2" there is not one attempt at connecting to SD from this client! And in the end the error message from bacula-dir is something different: > 8-Jan 08:11 bacula-dir JobId 33: Fatal error: Unable to authenticate > with File daemon at "external-fd.domain.de:9102". Possible causes: > Passwords or names not the same or > Maximum Concurrent Jobs exceeded on the FD or > FD networking messed up (restart daemon). > Please see > http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000 > > for help. > 28-Jan 08:11 bacula-dir JobId 33: Fatal error: Network error with FD > during Backup: ERR=Unterbrechung während des Betriebssystemaufrufs > 28-Jan 08:11 bacula-dir JobId 33: Fatal error: No Job status returned > from FD. > 28-Jan 08:11 bacula-dir JobId 33: Error: Bacula bacula-dir 3.0.3 > (18Oct09): 28-J I have even tried without any passwords, I have copied and pasted the client name everywhere to make sure there is no typo in there. And then - just from pure desperation - I started it bacula-fd manually instead of via launchd (with the same parameters launchd is given) - and now it works! Somehow communication does not work correctly if bacula-fd is started via launchd (/sbin/bacula-fd -f -c /etc/bacula/bacula-fd.conf). Anyone seen that before? Any workaround for that? It is MacOS X Client 10.5.5 Intel (uname -a outputs "Darwin external-fd.domain.de 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep 3 11:29:43 PDT 2008; root:xnu-1228.7.58~1/RELEASE_I386 i386"). Any help or hint would be greatly appreciated! Dirk ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users