Hi Jean-Louis,
I thought I had applied the patches on this machine also, but it turns out I
didn't (sorry about that). I applied the patches and ran a new dump. This
time all 4 file systems succeeded, though /usr got the same "strange"
message as on the other machine:
1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: start at Mon Aug 31 14:58:02 2009
1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009
1251752282.527795: sendbackup: Parsed request as: program `DUMP'
1251752282.527880: sendbackup: disk `/usr'
1251752282.527931: sendbackup: device `/usr'
1251752282.527978: sendbackup: level 0
1251752282.528025: sendbackup: since NODATE
1251752282.528072: sendbackup: options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 -
/dev/rsd0d" in pipeline
1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: "/sbin/restore -tvf -
2>&1 | sed -e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
1251752282.627853: sendbackup: 90: normal(|): DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup: 90: normal(|): DUMP: Date of last level 0
dump: the epoch
1251752282.754862: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0d
(/usr) to standard output
1251752282.792706: sendbackup: 90: normal(|): DUMP: mapping (Pass I)
[regular files]
1251752286.777085: sendbackup: 90: normal(|): DUMP: mapping (Pass II)
[directories]
1251752286.779538: sendbackup: 90: normal(|): DUMP: estimated 653358
tape blocks.
1251752286.789190: sendbackup: 90: normal(|): DUMP: Volume 1 started at:
Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup: 90: normal(|): DUMP: dumping (Pass III)
[directories]
1251752290.646004: sendbackup: 90: normal(|): DUMP: dumping (Pass IV)
[regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource
temporarily unavailable
1251752424.852658: sendbackup: 43: size(|): DUMP: 664333 tape blocks
1251752424.854957: sendbackup: 90: normal(|): DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup: 90: normal(|): DUMP: Volume 1 completed
at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup: 90: normal(|): DUMP: Volume 1 took
0:02:18
1251752424.860994: sendbackup: 90: normal(|): DUMP: Volume 1 transfer
rate: 4814 KB/s
1251752424.863042: sendbackup: 90: normal(|): DUMP: Date this dump
completed: Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup: 90: normal(|): DUMP: Average transfer
rate: 4814 KB/s
1251752424.869138: sendbackup: 90: normal(|): DUMP: level 0 dump on Mon
Aug 31 14:58:02 2009
1251752424.878026: sendbackup: 90: normal(|): DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24
2009
So now I've seen the same behavior that Stan noted: applying the debug patch
seems to "fix" the dump.
Also, in case you didn't see it before, here is a snippet from an email
thread Stan started on an OpenBSD list:
> > I am trying to get Amanda of a recent vintage working on 4.5. The
developrs
> > don't understand how writing to a blocking pipe can return EAGAIN.
> > Should the write just block?
>
> Yes, it should, and a quick, about-to-go-to-bed readthrough of the
> pipe write path shows that it shouldn't; is Amanda doing fcntls to
> the fd? If so, check those to be absolutely sure that they're not
> doing something hinkey.
Thanks,
Michael
On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau
<[email protected]>wrote:
> Do you have the patch I sent to stan on this system?
>
> The patch check before and after the write if the pipe is in O_NONBLOCK or
> not and give an error if it is.
>
> I'm totally lost since it is in blocking mode and you get EAGAIN, which is
> impossible!!!!
>
> Jean-louis
>
> Michael Burk wrote:
>
>> Here are the two sendbackup.*.debug files for the / fs.
>>
>> First:
>>
>> 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
>> 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
>> 1251693102.345326: sendbackup: Version 2.6.2alpha
>> 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
>> 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
>> 1251693102.382860: sendbackup: Parsed request as: program `DUMP'
>> 1251693102.382940: sendbackup: disk `/'
>> 1251693102.382991: sendbackup: device `/'
>> 1251693102.383039: sendbackup: level 0
>> 1251693102.383086: sendbackup: since NODATE
>> 1251693102.383134: sendbackup: options `'
>> 1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
>> 1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
>> 1251693102.398325: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 -
>> /dev/rsd0a" in pipeline
>> 1251693102.403120: sendbackup: Started backup
>> 1251693102.461663: sendbackup: 90: normal(|): DUMP: Date of this level
>> 0 dump: Sun Aug 30 22:31:42 2009
>> 1251693102.468669: sendbackup: 90: normal(|): DUMP: Date of last level
>> 0 dump: the epoch
>> 1251693102.487536: sendbackup: Started index creator: "/sbin/restore -tvf
>> - 2>&1 | sed -e '
>> s/^leaf[ ]*[0-9]*[ ]*\.//
>> t
>> /^dir[ ]/ {
>> s/^dir[ ]*[0-9]*[ ]*\.//
>> s%$%/%
>> t
>> }
>> d
>> '"
>> 1251693102.519939: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a
>> (/) to standard output
>> 1251693102.588086: sendbackup: 90: normal(|): DUMP: mapping (Pass I)
>> [regular files]
>> 1251693104.583392: sendbackup: 90: normal(|): DUMP: mapping (Pass II)
>> [directories]
>> 1251693104.585824: sendbackup: 90: normal(|): DUMP: estimated 35134
>> tape blocks.
>> 1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume 1 started
>> at: Sun Aug 30 22:31:44 2009
>> 1251693104.615444: sendbackup: 90: normal(|): DUMP: dumping (Pass III)
>> [directories]
>> 1251693104.919691: sendbackup: 90: normal(|): DUMP: dumping (Pass IV)
>> [regular files]
>> 1251693105.800795: sendbackup: critical (fatal): index tee cannot write
>> [Resource temporarily unavailable]
>> 1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee
>> cannot write [Resource temporarily unavailable]
>> 1251693105.827339: sendbackup: 90: normal(|): DUMP: Broken pipe
>> 1251693105.832068: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is
>> aborted.
>> 1251693105.839675: sendbackup: critical (fatal): error [dump (7257)
>> /sbin/dump returned 3]
>>
>> Second:
>>
>> 1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version
>> 2.6.2alpha: start at Sun Aug 30 22:32:03 2009
>> 1251693123.841613: sendbackup: Version 2.6.2alpha
>> 1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version
>> 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
>> 1251693123.896360: sendbackup: Parsed request as: program `DUMP'
>> 1251693123.896437: sendbackup: disk `/'
>> 1251693123.896487: sendbackup: device `/'
>> 1251693123.896535: sendbackup: level 0
>> 1251693123.896584: sendbackup: since NODATE
>> 1251693123.896632: sendbackup: options `'
>> 1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
>> 1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
>> 1251693123.911778: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 -
>> /dev/rsd0a" in pipeline
>> 1251693123.916860: sendbackup: Started backup
>> 1251693123.992261: sendbackup: Started index creator: "/sbin/restore -tvf
>> - 2>&1 | sed -e '
>> s/^leaf[ ]*[0-9]*[ ]*\.//
>> t
>> /^dir[ ]/ {
>> s/^dir[ ]*[0-9]*[ ]*\.//
>> s%$%/%
>> t
>> }
>> d
>> '"
>> 1251693123.995743: sendbackup: 90: normal(|): DUMP: Date of this level
>> 0 dump: Sun Aug 30 22:32:03 2009
>> 1251693124.009013: sendbackup: 90: normal(|): DUMP: Date of last level
>> 0 dump: the epoch
>> 1251693124.038977: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a
>> (/) to standard output
>> 1251693124.143454: sendbackup: 90: normal(|): DUMP: mapping (Pass I)
>> [regular files]
>> 1251693126.173916: sendbackup: 90: normal(|): DUMP: mapping (Pass II)
>> [directories]
>> 1251693126.176287: sendbackup: 90: normal(|): DUMP: estimated 35134
>> tape blocks.
>> 1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume 1 started
>> at: Sun Aug 30 22:32:06 2009
>> 1251693126.205305: sendbackup: 90: normal(|): DUMP: dumping (Pass III)
>> [directories]
>> 1251693126.505055: sendbackup: 90: normal(|): DUMP: dumping (Pass IV)
>> [regular files]
>> 1251693126.797271: sendbackup: critical (fatal): index tee cannot write
>> [Resource temporarily unavailable]
>> 1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee
>> cannot write [Resource temporarily unavailable]
>> 1251693126.821488: sendbackup: 90: normal(|): DUMP: Broken pipe
>> 1251693126.825377: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is
>> aborted.
>> 1251693126.833573: sendbackup: critical (fatal): error [dump (22791)
>> /sbin/dump returned 3]
>>
>> Thanks,
>> Michael
>>
>> On Mon, Aug 31, 2009 at 12:49 PM, Jean-Louis Martineau <
>> [email protected] <mailto:[email protected]>> wrote:
>>
>> You get the error in the index pipe instead of the data path. The
>> backup is correct but your index is empty. That's why you get a
>> STRANGE result instead of a failure.
>>
>> Can you post a sendbackup.*.debug for a dle that failed?
>>
>> Jean-Louis
>>
>> Michael Burk wrote:
>>
>> Hello,
>>
>> I applied the patches to the 20090827 snapshot. I tried it on
>> two OpenBSD 4.5 sparc64 systems, forcing both to do full
>> backups. One system seemed to work on all 3 file systems, the
>> other failed on all 4 file systems with the same errors as
>> before (exactly like what Stan reported). I'm using bsdtcp auth.
>>
>> The system that worked, however, did give a "strange" result
>> in the report on one filesystem. Here's the corresponding
>> "sendbackup" debug file:
>>
>> 1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150
>> version 2.6.2alpha: start at Sun Aug 30 2
>> 2:25:00 2009
>> 1251692700.530411: sendbackup: Version 2.6.2alpha
>> 1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150
>> version 2.6.2alpha: rename at Sun Aug 30
>> 22:25:00 2009
>> 1251692700.555394: sendbackup: Parsed request as: program `DUMP'
>> 1251692700.555435: sendbackup: disk `/usr'
>> 1251692700.555461: sendbackup: device `/usr'
>> 1251692700.555484: sendbackup: level 0
>> 1251692700.555507: sendbackup: since NODATE
>> 1251692700.555530: sendbackup: options `'
>> 1251692700.556750: sendbackup: start:
>> zirconium.example.com:/usr lev 0
>> 1251692700.559495: sendbackup: dumping device '/dev/rsd1d'
>> with 'ffs'
>> 1251692700.566094: sendbackup: Spawning "/sbin/dump dump 0usf
>> 1048576 - /dev/rsd1d" in pipeline
>> 1251692700.571185: sendbackup: Started backup
>> 1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK
>> before dup2
>> 1251692700.597388: sendbackup: Started index creator:
>> "/sbin/restore -tvf - 2>&1 | sed -e '
>> s/^leaf[ ]*[0-9]*[ ]*\.//
>> t
>> /^dir[ ]/ {
>> s/^dir[ ]*[0-9]*[ ]*\.//
>> s%$%/%
>> t
>> }
>> d
>> '"
>> 1251692700.611526: sendbackup: 90: normal(|): DUMP: Date
>> of this level 0 dump: Sun Aug 30 22:25:
>> 00 2009
>> 1251692700.669703: sendbackup: 90: normal(|): DUMP: Date
>> of last level 0 dump: the epoch
>> 1251692700.670724: sendbackup: 90: normal(|): DUMP:
>> Dumping /dev/rsd1d (/usr) to standard output
>> 1251692700.763600: sendbackup: 90: normal(|): DUMP:
>> mapping (Pass I) [regular files]
>> 1251692705.618345: sendbackup: 90: normal(|): DUMP:
>> mapping (Pass II) [directories]
>> 1251692705.620040: sendbackup: 90: normal(|): DUMP:
>> estimated 1330085 tape blocks.
>> 1251692705.623976: sendbackup: 90: normal(|): DUMP: Volume
>> 1 started at: Sun Aug 30 22:25:05 2009
>> 1251692705.650687: sendbackup: 90: normal(|): DUMP:
>> dumping (Pass III) [directories]
>> 1251692745.051547: sendbackup: 90: normal(|): DUMP:
>> dumping (Pass IV) [regular files]
>> 1251692747.915767: sendbackup: 115: strange(?): sed: stdout:
>> Resource temporarily unavailable
>> 1251692954.633083: sendbackup: 43: size(|): DUMP:
>> 1437829 tape blocks
>> 1251692954.637414: sendbackup: 90: normal(|): DUMP: Date
>> of this level 0 dump: Sun Aug 30 22:25:00 2009
>> 1251692954.641414: sendbackup: 90: normal(|): DUMP: Volume
>> 1 completed at: Sun Aug 30 22:29:14 2009
>> 1251692954.641858: sendbackup: Index pipe exited with status 1
>> 1251692954.642719: sendbackup: 90: normal(|): DUMP: Volume
>> 1 took 0:04:09
>> 1251692954.643918: sendbackup: 90: normal(|): DUMP: Volume
>> 1 transfer rate: 5774 KB/s
>> 1251692954.645095: sendbackup: 90: normal(|): DUMP: Date
>> this dump completed: Sun Aug 30 22:29:14 2009
>> 1251692954.646239: sendbackup: 90: normal(|): DUMP:
>> Average transfer rate: 5774 KB/s
>> 1251692954.647430: sendbackup: 90: normal(|): DUMP: level
>> 0 dump on Sun Aug 30 22:25:00 2009
>> 1251692954.653480: sendbackup: 90: normal(|): DUMP: DUMP
>> IS DONE
>> 1251692954.653841: sendbackup: Parsed backup messages
>> 1251692954.654032: sendbackup: pid 30459 finish time Sun Aug
>> 30 22:29:14 2009
>>
>> -- Michael
>>
>> On Fri, Aug 28, 2009 at 5:22 AM, stan <[email protected]
>> <mailto:[email protected]> <mailto:[email protected]
>>
>> <mailto:[email protected]>>> wrote:
>>
>> On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
>> > Cool - can you send me the patch? Or is it already in the
>> 0826
>> snapshot?
>> >
>> > FYI - I got 2.5.0p2 working, but could never get 2.5.1p3
>> working, even with
>> > the same config. No idea why.
>> > I started looking at the source code last night in the 0825
>> snapshot. I'll
>> > discontinue that search if there's a patch.
>> >
>> This is interesting. I sent Jean-Louis Martineau an email,
>> saying
>> teaks for
>> fixing things, and he replied that the patch he sent me
>> just added
>> debugging,
>> it was not intended to fix anything.
>>
>> Bit, using this snapshot 20090813, and the patch, which I will
>> send to you.
>> I have it working on my 3 OpenBSD 4.5 machines that I have
>> upgraded so far.
>> I ran several test yesterday, and a full fledged backup run
>> last
>> night,
>> and I have not seen any issues since installing this code.
>>
>> I am going to attach the patch to this email.
>>
>> I would greatly appreciate it, if you would apply this patch to
>> the above
>> snapshot, test, and tell me what your results are.
>>
>> I am very puzzled at the moment.
>>
>> BTW, is your failure the same as mine? That is amcheck
>> passes, and
>> most
>> times you get a PARTIAL backup?
>>
>> BTW, I am using bsdtcp "auth" and client compression now, but I
>> have made
>> this work with this code, with bsd auth, and no
>> compression, so I
>> don't
>> think that has any affect on this issue.
>>
>> Thanks for the help!
>>
>> --
>> One of the main causes of the fall of the roman empire was
>> that,
>> lacking
>> zero, they had no way to indicate successful termination of
>> their C
>> programs.
>>
>>
>>
>>
>>
>