Hi everybody,
I am following Arno's suggestion about a problem in writing additional tapes started with 1.38.11 and persisting in 2.0.2. Arno's idea is that the problem could be on tape or OS and maybe he is right,  but the same hardware/OS with 1.36.3 did not have the problem (it could also be a hardware failure in the meantime, but apparently all the rest works).
What I think happen is that once  there have been an ERNOSPC writing a tape block the last block is correctly reread and the tape changed by the autochanger with a prelabelled tape, the the label is rewritten (or skipped, I do not know) and when the block is written on the new tape it get again the ERNOSPC error. This is reported again as an EOT, but the last block control fails and the job fails too.
I have also made a very small modification to the block.c routine (I am not a programmer and more then that I am not a C programmer, so this change is not correct for sure, even if it solves the problem in some way). The change is:

532d531
<    if (dev->file == 0) { dev->clrerror(-1); }

that simply means (I hope) "clear all errors before writing the block if this is the first tape file"
That because the write continues to get the errors until an EOF is written. Because the file is normally around 1 Gb this slows down performance to 20-25% during this phase, but I can deal with it.

What I would like to know is if this behaviour is really an hardware/firmare/OS problem according to your opinion and if the block routine could br made more resilient in some way (EOF mark at tape end, Closing and reopening of fd or whatever).

Many thanks if you would like to give me an asnswer and very many thanks anyway for this great package.



--------------------------------------------------------------------------
Ferdinando Pasqualetti
G.T.Dati srl
Tel. 0557310862 - 3356172731 - Fax 055720143



Hi,

On 3/11/2007 6:33 PM, Ferdinando Pasqualetti wrote:
>
> Hi Arno,
> I made some tests and this is what I think.
> When there is a tape change after an out of space error susequent block
> write continue to get that error even after the tape change by the
> robot. This continue.
> I made some changes to the block.c routine

You'd better discuss this at bacula-devel, I think, or send Kern a mail
explaining the problem and the resolution.

> (very simple, because I'm not
> a C programmer and also I don't know the logic of sd program). I made
> the routine enter the retry loop even for ERNOSPC if file number is 0.
> This made bacula-sd work correctly (but it took 20 hours to write file
> 0). After writing the EOF mark speed is normal again.
> My idea is that changing the tape does not reset the EOD condition on
> the tape

That sounds like a bug, either in the hardware or the HBA driver.

> until a file mark is written. I do not know if this a wrong
> device or OS error, but I believe that the FD of tape should be closed
> and reopened in a tape change.

I know about nothing about these details, so I won't comment on it...

> dd and mt tests always gave correct results, but dd always write an EOF
> mark at the and of the transfer.
>
> If you have some idea about that I will be very happy. Thank you very
> much in any c asze.

Difficult problem, I think.

If this is a hardware or driver problem, I don't think modifying the SD
code is the right solution.

If it works for you - fine, but it might be that you have to manage that
patch for your installation yourself.

Arno

>  
> --------------------------------------------------------------------------
> Ferdinando Pasqualetti
> G.T.Dati srl
> Tel. 0557310862 - 3356172731 - Fax 055720143
>
>
>
>
>
> *Ferdinando Pasqualetti/San Lazzaro/Conserve Italia*
>
> 27/02/2007 09.47
>
>                  
> Per
>                  Arno Lehmann <[EMAIL PROTECTED]>
> CC
>                  bacula-users <bacula-users@lists.sourceforge.net>
> Oggetto
>                  Rif: Re: [Bacula-users] Change tape problem Link
> <Notes:///C12563A900369A93/D46731D63F38165B8025651C003EAC4E/56E29AEC82B8E837C125728E006B77F9>
>
>
>                  
>
>
>
>
> Hi Arno,
> thank you very much for your answer. I will try asap the tests you are
> suggesting. By the way, I purged the volumes involved in the error shown
> in the original message (it was the third try), restarted the backup job
> and here is the (correct) result.
>
> 25-feb 19:55 bacula-dir: Start Backup JobId 12927,
> Job=webfs3-job.2007-02-25_19.55.40
> 25-feb 19:55 bacula-dir: Recycled volume "web-004"
> 25-feb 19:55 webfs3: ClientRunBeforeJob: run command "/root/restartsmb"
> 25-feb 19:55 webfs3: ClientRunBeforeJob: Shutting down SMB services: [
>  OK  ]
> 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato
> 25-feb 19:55 webfs3: ClientRunBeforeJob: smbd: nessun processo terminato
> 25-feb 19:55 webfs3: ClientRunBeforeJob: Starting SMB services: [  OK  ]
> 25-feb 19:55 webfs3: ClientRunBeforeJob: [  OK  ]
> 25-feb 19:55 bacula-sd: 3307 Issuing autochanger "unload slot 7, drive
> 0" command.
> 25-feb 19:57 bacula-sd: 3304 Issuing autochanger "load slot 3, drive 0"
> command.
> 25-feb 19:57 bacula-sd: 3305 Autochanger "load slot 3, drive 0", status
> is OK.
> 25-feb 19:57 bacula-sd: 3301 Issuing autochanger "loaded? drive 0" command.
> 25-feb 19:57 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
> Slot 3.
> 25-feb 19:57 bacula-sd: Recycled volume "web-004" on device "LTO1"
> (/dev/lto1), all previous data lost.
> webfs3:      /proc is a different filesystem. Will not descend from /
> into /proc
> webfs3:      /boot is a different filesystem. Will not descend from /
> into /boot
> webfs3:      /dev is a different filesystem. Will not descend from /
> into /dev
> webfs3:      /var/lib/nfs/rpc_pipefs is a different filesystem. Will not
> descend from / into /var/lib/nfs/rpc_pipefs
> webfs3:      /sys is a different filesystem. Will not descend from /
> into /sys
> webfs3:      /uno is a different filesystem. Will not descend from /
> into /uno
> 26-feb 04:14 bacula-sd: End of Volume "web-004" at 594:6519 on device
> "LTO1" (/dev/lto1). Write of 64512 bytes got -1.
> 26-feb 04:14 bacula-sd: Re-read of last block succeeded.
> 26-feb 04:14 bacula-sd: End of medium on Volume "web-004"
> Bytes=594,382,602,240 Blocks=9,213,519 at 26-feb-2007 04:14.
> 26-feb 04:14 bacula-dir: Recycled volume "web-005"
> 26-feb 04:14 bacula-sd: 3301 Issuing autochanger "loaded? drive 0" command.
> 26-feb 04:14 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
> Slot 3.
> 26-feb 04:14 bacula-sd: 3307 Issuing autochanger "unload slot 3, drive
> 0" command.
> 26-feb 04:15 bacula-sd: 3304 Issuing autochanger "load slot 4, drive 0"
> command.
> 26-feb 04:15 bacula-sd: 3305 Autochanger "load slot 4, drive 0", status
> is OK.
> 26-feb 04:15 bacula-sd: 3301 Issuing autochanger "loaded? drive 0" command.
> 26-feb 04:15 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
> Slot 4.
> 26-feb 04:15 bacula-sd: Recycled volume "web-005" on device "LTO1"
> (/dev/lto1), all previous data lost.
> 26-feb 04:15 bacula-sd: New volume "web-005" mounted on device "LTO1"
> (/dev/lto1) at 26-feb-2007 04:15.
> 26-feb 10:21 bacula-sd: End of Volume "web-005" at 528:6656 on device
> "LTO1" (/dev/lto1). Write of 64512 bytes got -1.
> 26-feb 10:21 bacula-sd: Re-read of last block succeeded.
> 26-feb 10:21 bacula-sd: End of medium on Volume "web-005"
> Bytes=528,395,664,384 Blocks=8,190,656 at 26-feb-2007 10:21.
> 26-feb 10:21 bacula-dir: Recycled volume "web-006"
> 26-feb 10:21 bacula-sd: 3301 Issuing autochanger "loaded? drive 0" command.
> 26-feb 10:21 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
> Slot 4.
> 26-feb 10:21 bacula-sd: 3307 Issuing autochanger "unload slot 4, drive
> 0" command.
> 26-feb 10:22 bacula-sd: 3304 Issuing autochanger "load slot 5, drive 0"
> command.
> 26-feb 10:22 bacula-sd: 3305 Autochanger "load slot 5, drive 0", status
> is OK.
> 26-feb 10:22 bacula-sd: 3301 Issuing autochanger "loaded? drive 0" command.
> 26-feb 10:22 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
> Slot 5.
> 26-feb 10:23 bacula-sd: Recycled volume "web-006" on device "LTO1"
> (/dev/lto1), all previous data lost.
> 26-feb 10:23 bacula-sd: New volume "web-006" mounted on device "LTO1"
> (/dev/lto1) at 26-feb-2007 10:23.
> 26-feb 13:49 bacula-sd: Job write elapsed time = 17:48:45, Transfer rate
> = 21.65 M bytes/second
> 26-feb 13:49 bacula-sd: Alert: SCSI 2 tape drive:
> 26-feb 13:49 bacula-sd: Alert: File number=267, block number=0, partition=0.
> 26-feb 13:49 bacula-sd: Alert: Tape block size 0 bytes. Density code
> 0x44 (no translation).
> 26-feb 13:49 bacula-sd: Alert: Soft error count since last status=0
> 26-feb 13:49 bacula-sd: Alert: General status bits on (81010000):
> 26-feb 13:49 bacula-sd: Alert:  EOF ONLINE IM_REP_EN
> 26-feb 13:49 bacula-dir: Bacula 2.0.2 (28Jan07): 26-feb-2007 13:49:03
>  JobId:                  12927
>  Job:                    webfs3-job.2007-02-25_19.55.40
>  Backup Level:           Full
>  Client:                 "webfs3" 2.0.2 (28Jan07)
> i686-redhat-linux-gnu,redhat,Enterprise release
>  FileSet:                "webfs3-fileset" 2005-04-30 07:13:53
>  Pool:                   "webfs" (From Job resource)
>  Storage:                "LTO-1" (From user selection)
>  Scheduled time:         25-feb-2007 19:55:17
>  Start time:             25-feb-2007 19:55:46
>  End time:               26-feb-2007 13:49:03
>  Elapsed time:           17 hours 53 mins 17 secs
>  Priority:               10
>  FD Files Written:       4,046,880
>  SD Files Written:       4,046,880
>  FD Bytes Written:       1,387,910,783,372 (1.387 TB)
>  SD Bytes Written:       1,388,589,182,436 (1.388 TB)
>  Rate:                   21552.4 KB/s
>  Software Compression:   None
>  VSS:                    no
>  Encryption:             no
>  Volume name(s):         web-004|web-005|web-006
>  Volume Session Id:      1
>  Volume Session Time:    1172427565
>  Last Volume Bytes:      266,951,559,168 (266.9 GB)
>  Non-fatal FD errors:    0
>  SD Errors:              0
>  FD termination status:  OK
>  SD termination status:  OK
>  Termination:            Backup OK
>
>
> The thing that is not in favour of an hardware or OS problem is that
> with the same hardware and OS bacula 1.36.3 had not this problem, it
> arised with 1.38.11.
> The device setup is quite simple:
>
>
> Device {
>   Name = LTO1
>   Media Type = LTO-3
>   Archive Device = /dev/lto1
>   AutomaticMount = yes;               # when device opened, read it
>   AlwaysOpen = no;
>   Autoselect = no
>   RemovableMedia = yes;
>   RandomAccess = no;
>   Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
>   Changer Device = /dev/chg4
>   Drive Index = 0
>   AutoChanger = yes
>   Alert Command = "sh -c 'mt -f %a status'"
>   Maximum Network Buffer Size = 65536
> }
>
> Devices /dev/lto1 and /dev/chg4 are symlinks to real devices in order to
> manage hardware configuration changes.
>
> Thanks again
>
> --------------------------------------------------------------------------
> Ferdinando Pasqualetti
> G.T.Dati srl
> Tel. 0557310862 - 3356172731 - Fax 055720143
>
>
>
>
>
> *Arno Lehmann <[EMAIL PROTECTED]>*
> Inviato da: [EMAIL PROTECTED]
>
> 26/02/2007 20.33
>
>                  
> Per
>                  bacula-users <bacula-users@lists.sourceforge.net>
> CC
>                  
> Oggetto
>                  Re: [Bacula-users] Change tape problem
>
>
>                  
>
>
>
>
>
> Hello,
>
> On 2/26/2007 10:54 AM, Ferdinando Pasqualetti wrote:
>  >
>  > Hi Bacula users,
>  > sorry if you get this message two times, I sent it with a wrong sender
>  > (not in the list), so I am sending it again.
>  > I am facing a problem that came out with rev. 1.38.11 (I never saw it
>  > with 1.36.3). The problem did not happen all times, but very often. Now
>  > I switched to 2.0.2 and this problem is much more frequent.
>  > The problem is that when a tape was exhausted bacula changes correctly
>  > the tape in the autochanger drive but just after get this error:
>  >
>  > 25-feb 02:47 bacula-sd: End of Volume "web-004" at 594:3362 on device
>  > "LTO1" (/dev/lto1). Write of 64512 bytes got -1.
>  > 25-feb 02:47 bacula-sd: Re-read of last block succeeded.
>  > 25-feb 02:47 bacula-sd: End of medium on Volume "web-004"
>  > Bytes=594,178,937,856 Blocks=9,210,362 at 25-feb-2007 02:47.
>  > 25-feb 02:47 bacula-sd: 3301 Issuing autochanger "loaded? drive 0"
> command.
>  > 25-feb 02:47 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
>  > Slot 3.
>  > 25-feb 02:47 bacula-sd: 3307 Issuing autochanger "unload slot 3, drive
>  > 0" command.
>  > 25-feb 02:48 bacula-sd: 3304 Issuing autochanger "load slot 4, drive 0"
>  > command.
>  > 25-feb 02:48 bacula-sd: 3305 Autochanger "load slot 4, drive 0", status
>  > is OK.
>  > 25-feb 02:48 bacula-sd: 3301 Issuing autochanger "loaded? drive 0"
> command.
>  > 25-feb 02:48 bacula-sd: 3302 Autochanger "loaded? drive 0", result is
>  > Slot 4.
>  > 25-feb 02:49 bacula-sd: Wrote label to prelabeled Volume "web-005" on
>  > device "LTO1" (/dev/lto1)
>  > 25-feb 02:49 bacula-sd: New volume "web-005" mounted on device "LTO1"
>  > (/dev/lto1) at 25-feb-2007 02:49.
>  > 25-feb 02:49 bacula-sd: End of Volume "web-005" at 0:1 on device "LTO1"
>  > (/dev/lto1). Write of 64512 bytes got -1.
>  > 25-feb 02:49 bacula-sd: webfs3-job.2007-02-24_20.03.22 Error: Re-read of
>  > last block OK, but block numbers differ. Last block=0 Current
> block=9210362.
>  > 25-feb 02:49 bacula-sd: Job write elapsed time = 06:43:26, Transfer rate
>  > = 24.52 M bytes/second
>  > 25-feb 02:49 webfs3: webfs3-job.2007-02-24_20.03.22 Fatal error:
>  > backup.c:860 Network send error to SD. ERR=Pipe rotta
>  > 25-feb 02:49 bacula-dir: webfs3-job.2007-02-24_20.03.22 Error: Bacula
>  > 2.0.2 (28Jan07): 25-feb-2007 02:49:25
>  >
>  > It seems there are two problems, the first one (and the most important
>  > one) is that bacula get an end of volume on the new tape,
>
> What Bacula reports as an EOT can be caused by a drive error, too, so
> for the time being I assume that the second error is tightly related to
> this one.
>
>  > and the second
>  > one is the difference in the last block (it appears to be the last block
>  > of the previous tape).
>
> If that's the case, and your description seems quite clear, you might
> have found an OS or hardware bug, too.
>
> This is only guesswork, but it could be possible that, after a tape
> change, the hardware or the tape driver don't update their state
> information.
>
> If that's the case, you could try the following:
> - first, have a look at your system log and dmesg output. There might be
>  errors reported there.
> - second, try to reproduce the problem without using Bacula. Unmount the
> tape drive from bconsole. Load a tape (an unused one, or one with write
> protection). If you use an empty tape, write some data and some file
> marks to it, ending with an EOT mark. dd and mt are tools for that purpose.
> Then, use tapeinfo or st to observe the tape status, especialy the block
> position reported, when doing some rewinds, fast forwards, offline, and
> see what happens after you used mtx to unload and reload that tape.
>
> If there really is a problem with the hardware or the OD driver, you
> should be able to reproduce it then. Updating the drive firmware and the
> OS (or, if that's up to date, filing a bug report) would be two options
> then.
>
> Otherwise, you should run btape again, because there are some things in
> the report I don't like - errors writing the last block to tape should
> not happen with current hardware, for example. You might try to tune
> your device configuration, and perhaps you'll have to set the tape
> driver to a different write mode. Suggesting something is difficult
> without seeing how it's setup now :-)
>
>  > Bacula is a MySQL version on a RedHat AS 4.04, rpmbuilt on that system,
>  > an HP proliant G3 3.2 Ghz, 2Gb RAM.
>  > The tape is an MSL6000 with two LTO-3 drives, drived by bacula directly
>  > (not using the autochanger as device - 1.36.3 setup).
>  > Btape tests run correctly, including the "fill and change tape" (I am
>  > attaching the test result, if someone is interested).
>  > Did anyone get a similar problem?
>
> That basic setup should run ok I think... nothing unusual there.
>
> Arno
>
>  >
>  >
> --------------------------------------------------------------------------
>  > Ferdinando Pasqualetti
>  > G.T.Dati srl
>  > Tel. 0557310862 - 3356172731 - Fax 055720143
>  >
>  >
>  >
>  > ------------------------------------------------------------------------
>  >
>  > -------------------------------------------------------------------------
>  > Take Surveys. Earn Cash. Influence the Future of IT
>  > Join SourceForge.net's Techsay panel and you'll get the chance to
> share your
>  > opinions on IT & business topics through brief surveys-and earn cash
>  > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>  >
>  >
>  > ------------------------------------------------------------------------
>  >
>  > _______________________________________________
>  > Bacula-users mailing list
>  > Bacula-users@lists.sourceforge.net
>  > https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> --
> IT-Service Lehmann                    [EMAIL PROTECTED]
> Arno Lehmann                  http://www.its-lehmann.de
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to