The taper log from last night looks exactly like the amcheck log, showing the same inventory followed by the updating state and the same three repetitions of the warning at line 102.

The taper log from the night before, when it was successful, ends with this:

   Thu Oct 08 23:30:04.243540450 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   36:Full :VolumeTag=000055L7
   Thu Oct 08 23:30:04.243588794 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   37:Full :VolumeTag=000040L7
   Thu Oct 08 23:30:04.243636849 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   38:Full :VolumeTag=000041L7
   Thu Oct 08 23:30:04.243684260 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   39:Full :VolumeTag=000042L7
   Thu Oct 08 23:30:04.243727126 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   40:Empty
   Thu Oct 08 23:30:04.243765952 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   41:Empty
   Thu Oct 08 23:30:04.243804416 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   42:Empty
   Thu Oct 08 23:30:04.243847604 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   43:Full :VolumeTag=000043L7
   Thu Oct 08 23:30:04.243891382 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   44:Empty
   Thu Oct 08 23:30:04.243931546 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   45:Empty
   Thu Oct 08 23:30:04.243969440 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   46:Empty
   Thu Oct 08 23:30:04.244012542 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   47:Full :VolumeTag=CLN001CU
   Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   48:Full :VolumeTag=CLN002CU
   Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: updating state
   Thu Oct 08 23:30:04.253017380 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: too early for
   another 'status' invocation
   Thu Oct 08 23:30:04.253371305 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: loading slot '4'
   Thu Oct 08 23:30:04.253422930 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: requested volume
   is already in drive 0
   Thu Oct 08 23:30:04.253728335 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: polling
   'tape:/dev/nst1' to see if it's ready
   Thu Oct 08 23:30:04.256414884 2020: pid 18920: thd-0x28b4800: taper: Device 
is in variable block
   size
   Thu Oct 08 23:30:23.300747859 2020: pid 18920: thd-0x28b4800: taper: Slot 4 
with label
   Bio-Research-058 is usable
   Thu Oct 08 23:30:23.304473715 2020: pid 18920: thd-0x28b4800: taper: 
Amanda::Taper::Scribe
   preparing to write, part size 0, using no cache (PEOM will be fatal) 
(splitter)  (no LEOM)
   Thu Oct 08 23:30:23.304769712 2020: pid 18920: thd-0x28b4800: taper: Starting 
<Xfer@0x3dfcae0
   (<XferSourceHolding@0x3dc1000> -> <XferDestTaperSplitter@0x3dc20c0>)>
   Thu Oct 08 23:30:23.304819253 2020: pid 18920: thd-0x28b4800: taper: Final 
linkage:
   <XferSourceHolding@0x3dc1000> -(MEM_RING)-> <XferDestTaperSplitter@0x3dc20c0>
   Thu Oct 08 23:30:23.305432752 2020: pid 18920: thd-0x28b4800: taper: header 
native_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305451271 2020: pid 18920: thd-0x28b4800: taper: header 
client_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305465429 2020: pid 18920: thd-0x28b4800: taper: header 
server_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305552797 2020: pid 18920: thd-0x28b4800: taper: 
start_recovery called
   Thu Oct 08 23:30:23.325029323 2020: pid 18920: thd-0x28b4800: taper: 
Building type TAPESTART
   header of 524288-524288 bytes with name='Bio-Research-058' disk='' 
dumplevel=0 and blocksize=524288
   Thu Oct 08 23:30:31.666780210 2020: pid 18920: thd-0x28b4800: taper: 
invoking /usr/sbin/mtx -f
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status
   Thu Oct 08 23:30:32.316054467 2020: pid 18920: thd-0x28b4800: taper: mtx:   
Storage Changer
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 
0 Import/Export )
   Thu Oct 08 23:30:32.316179280 2020: pid 18920: thd-0x28b4800: taper: mtx: 
Data Transfer Element
   0:Full (Storage Element 4 Loaded):VolumeTag = 000058L7
   Thu Oct 08 23:30:32.316269484 2020: pid 18920: thd-0x28b4800: taper: mtx: 
Data Transfer Element
   1:Empty
   Thu Oct 08 23:30:32.316314827 2020: pid 18920: thd-0x28b4800: taper: mtx:    
   Storage Element
   1:Empty

I got 6.8TB on tape the night the above log represents.

A successful amcheck-device log from a couple of days ago looks just like that.

There are no log files with a name that includes "change".


On 10/10/20 6:20 PM, Nathan Stratton Treadway wrote:
On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote:
Also

8 lines like this in the kern.log:

    Oct  9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: 
segfault at 0 ip
    00007f94159617c6 sp 00007fff61039da8 error 4 in 
libc-2.23.so[7f94158d6000+1c0000]

Well, that doesn't really tell us more more about what is going wrong,
other than the slight hint the problem is all the way down in the the
libc library somehow.

I don't know how much you are trying to investigate further at this
point in your furlough schedule... but I still feel that comparing the
log from a succesfull run to this aborted run has the best chance of
generating a hint as to exactly what operation is underway at the point
of failure.


Also, the taper/changer logs from last night's run should give some hint
as to what it was attempting, and perhaps those logs will be different
enough from the amcheck-device logs that it'll give some new
information....

(The last operation that appears to be happening in your quoted
amcheck-device log lines is a scan through the tape-changer inventory.
I don't have a physical tape-drive changer myself so I don't have any
guesses as to what could be wrong, but based on the history of the
situation you described it does seem plausible that the inventory
database it's working from could contain some "unexpected" data of some
sort...)

                                                Nathan


----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~~~~~~~~~ - University of Massachusetts, Amherst

<[email protected]>

---------------

Erdös 4

Reply via email to