Nope, disabling tso and tx changed nothing. -John
On Fri, Apr 17, 2009 at 12:24:25PM -0400, John Lockard wrote: > On Fri, Apr 17, 2009 at 09:39:24AM +1000, James Harper wrote: > > Does belobog have the same network adapter and kernel as your other > > servers? > > It has the same ethernet (Intel e1000) as other servers and similar > kernel, but is now a mostly legacy system. Professor has ongoing > research which disallows me from upgrading. Unfortunately other systems > of that same "class" don't have nearly as much data for backing up. > > > What brand (machine and network adapter), what kernel, and what network > > adapter? > > Machine is a Dell PowerEdge 6650, Kernel is: > 2.6.13-15.12-smp x86_64 > > > Could be a tcp offload issue, try disabling tso and tx csum offload on > > the fd (use ethtool) > > Is this correct command? > ethtool -K eth0 tx off tso off > > > > Could also be a firewall issue timing out the connection... are the sd, > > dir, and fd on the same lan segment? > > The SD and DIR are the same machine. The FD is on a different LAN > segment. > > > James > > -John > > > > > > > > > -----Original Message----- > > > From: John Lockard [mailto:jlock...@umich.edu] > > > Sent: Friday, 17 April 2009 01:38 > > > To: bacula-users@lists.sourceforge.net > > > Subject: [Bacula-users] Backup failing reliably repeatable > > > > > > Client and server at 2.4.4. Both client and server are Linux 2.6 > > > > > > Logs from client: > > > -------------------- > > > 15-Apr 16:53 tibor-dir JobId 2954: Start Backup JobId 2954, > > Job=Belobog-Data- > > > Users.2009-04-15_16.32.07.04 > > > 15-Apr 16:53 tibor-dir JobId 2954: Using Volume "100108L2" from > > 'Scratch' > > > pool. > > > 15-Apr 16:53 tibor-dir JobId 2954: Using Device "NEO-LTO-1" > > > 15-Apr 16:53 tibor-sd JobId 2954: 3307 Issuing autochanger "unload > > slot 3, > > > drive 0" command. > > > 15-Apr 16:53 tibor-sd JobId 2954: 3304 Issuing autochanger "load slot > > 21, > > > drive 0" command. > > > 15-Apr 16:54 tibor-sd JobId 2954: 3305 Autochanger "load slot 21, > > drive 0", > > > status is OK. > > > 15-Apr 16:54 tibor-sd JobId 2954: Wrote label to prelabeled Volume > > "100108L2" > > > on device "NEO-LTO-1" (/dev/nst0) > > > 15-Apr 16:54 tibor-sd JobId 2954: Spooling data ... > > > 15-Apr 18:22 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 18:22 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,786,126 bytes ... > > > 15-Apr 18:32 tibor-sd JobId 2954: Despooling elapsed time = 00:09:57, > > Transfer > > > rate = 46.80 M bytes/second > > > 15-Apr 18:33 tibor-sd JobId 2954: Spooling data again ... > > > 15-Apr 19:29 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 19:29 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,794,024 bytes ... > > > 15-Apr 19:40 tibor-sd JobId 2954: Despooling elapsed time = 00:11:24, > > Transfer > > > rate = 40.85 M bytes/second > > > 15-Apr 19:41 tibor-sd JobId 2954: Spooling data again ... > > > 15-Apr 20:28 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 20:28 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,795,119 bytes ... > > > 15-Apr 20:42 tibor-sd JobId 2954: Despooling elapsed time = 00:14:30, > > Transfer > > > rate = 32.11 M bytes/second > > > 15-Apr 20:43 tibor-sd JobId 2954: Spooling data again ... > > > 15-Apr 21:27 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 21:27 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,795,083 bytes ... > > > 15-Apr 21:44 tibor-sd JobId 2954: Despooling elapsed time = 00:16:34, > > Transfer > > > rate = 28.11 M bytes/second > > > 15-Apr 21:45 tibor-sd JobId 2954: Spooling data again ... > > > 15-Apr 22:29 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 22:29 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,795,185 bytes ... > > > 15-Apr 22:43 tibor-sd JobId 2954: Despooling elapsed time = 00:14:28, > > Transfer > > > rate = 32.19 M bytes/second > > > 15-Apr 22:44 tibor-sd JobId 2954: Spooling data again ... > > > 15-Apr 23:29 tibor-sd JobId 2954: User specified spool size reached. > > > 15-Apr 23:29 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,793,858 bytes ... > > > 15-Apr 23:42 tibor-sd JobId 2954: Despooling elapsed time = 00:12:54, > > Transfer > > > rate = 36.10 M bytes/second > > > 15-Apr 23:43 tibor-sd JobId 2954: Spooling data again ... > > > 16-Apr 00:26 tibor-sd JobId 2954: User specified spool size reached. > > > 16-Apr 00:26 tibor-sd JobId 2954: Writing spooled data to Volume. > > Despooling > > > 27,941,794,859 bytes ... > > > 16-Apr 00:43 tibor-sd JobId 2954: Despooling elapsed time = 00:16:46, > > Transfer > > > rate = 27.77 M bytes/second > > > 16-Apr 00:43 tibor-sd JobId 2954: Spooling data again ... > > > 16-Apr 02:07 belobog-fd JobId 2954: Fatal error: backup.c:1087 Network > > send > > > error to SD. ERR=Connection reset by peer > > > 16-Apr 02:07 tibor-dir JobId 2954: Error: Bacula tibor-dir 2.4.4 > > (28Dec08): > > > 16-Apr-2009 02:07:38 > > > Build OS: x86_64-unknown-linux-gnu redhat Enterprise > > release > > > JobId: 2954 > > > Job: Belobog-Data-Users.2009-04-15_16.32.07.04 > > > Backup Level: Full > > > Client: "belobog" 2.4.4 (28Dec08) > > x86_64-unknown-linux- > > > gnu,suse,10.0 > > > FileSet: "belobog-data-users" 2009-04-11 05:15:01 > > > Pool: "Radev-Full" (From Job FullPool override) > > > Storage: "Overland_Neo_4000" (From Pool resource) > > > Scheduled time: 15-Apr-2009 16:32:02 > > > Start time: 15-Apr-2009 16:53:13 > > > End time: 16-Apr-2009 02:07:38 > > > Elapsed time: 9 hours 14 mins 25 secs > > > Priority: 30 > > > FD Files Written: 7,029,945 > > > SD Files Written: 0 > > > FD Bytes Written: 217,674,494,521 (217.6 GB) > > > SD Bytes Written: 0 (0 B) > > > Rate: 6543.6 KB/s > > > Software Compression: None > > > VSS: no > > > Storage Encryption: no > > > Volume name(s): 100108L2 > > > Volume Session Id: 50 > > > Volume Session Time: 1239651099 > > > Last Volume Bytes: 194,987,520,000 (194.9 GB) > > > Non-fatal FD errors: 0 > > > SD Errors: 0 > > > FD termination status: Error > > > SD termination status: Error > > > Termination: *** Backup Error *** > > > -------------------- > > > > > > On the client side I see this in the firewall log: > > > Apr 16 02:07:38 belobog kernel: SFW2-OUT-ERROR IN= OUT=eth0 > > SRC=xxx.xxx.xxx.32 > > > DST=xxx.xxx.yyy.25 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=59630 DF > > PROTO=TCP > > > SPT=11310 DPT=9103 WINDOW=1460 RES=0x00 ACK RST URGP=0 OPT > > > (0101080A3D08C91922123CB5) > > > > > > (SRC and DST have been cleansed in the firewall log by me) > > > > > > Looking at information on SuSEfirewall (iptables with wrappers) I'm > > led to > > > believe > > > the firewall log message is just a notice. Basically that the > > connection was > > > dropped, and not by the firewall. > > > > > > I can repeat this over and over on this same machine with a backup of > > this > > > partition or any other partition which is at least 300G in size (I > > don't > > > have any partitions larger than 217G but smaller than 200G). All of > > these > > > backups fail at roughly 217G as the FD Bytes Written. > > > > > > This backup runs from the client to the backup server with a 100G > > spool > > > area. From the spool, the data is written to LTO2 tape. Backups on > > this > > > system failed at 217G when my spool was 50G, 10G and 5G in size. > > > > > > I have other servers with more storage on which backups run fine, up > > to > > > 4TB. > > > > > > Any ideas? > > > > > > Thanks, > > > John > > > > > > -- > > > Brain: Pinky, Are you pondering what I'm pondering? > > > Pinky: Well, I think so, Brain, but I can't memorize > > > a whole opera in Yiddish. > > > ------------------------------------------------------------------- > > > John M. Lockard | U of Michigan - School of Information > > > Unix and Security Admin | 1214 SI North - 1075 Beal Ave. > > > jlock...@umich.edu | Ann Arbor, MI 48109-2112 > > > www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX > > > ------------------------------------------------------------------- > > > > > > > > ------------------------------------------------------------------------ > > ------ > > > Stay on top of everything new and different, both inside and > > > around Java (TM) technology - register by April 22, and save > > > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. > > > 300 plus technical and hands-on sessions. Register today. > > > Use priority code J9JMT32. http://p.sf.net/sfu/p > > > _______________________________________________ > > > Bacula-users mailing list > > > Bacula-users@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > > > -- > Brain: Pinky, Are you pondering what I'm pondering? > Pinky: Uh, I think so Brain, but you'd have to take > that whole bridge apart then, wouldn't you? > ------------------------------------------------------------------- > John M. Lockard | U of Michigan - School of Information > Unix and Security Admin | 1214 SI North - 1075 Beal Ave. > jlock...@umich.edu | Ann Arbor, MI 48109-2112 > www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX > ------------------------------------------------------------------- > > ------------------------------------------------------------------------------ > Stay on top of everything new and different, both inside and > around Java (TM) technology - register by April 22, and save > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. > 300 plus technical and hands-on sessions. Register today. > Use priority code J9JMT32. http://p.sf.net/sfu/p > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- "I think the best pictures are often on the edges of any situation, I don't find photographing the situation nearly as interesting as photographing the edges." - William Albert Allard ------------------------------------------------------------------- John M. Lockard | U of Michigan - School of Information Unix and Security Admin | 1214 SI North - 1075 Beal Ave. jlock...@umich.edu | Ann Arbor, MI 48109-2112 www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX ------------------------------------------------------------------- ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users