Hi Joshua,

could you post the Versionnumbers of the daemons you are using (i.e. fd, sd, & dir) ?

Kind Regards,
Sebastian Sura

Am 23.07.24 um 19:03 schrieb Joshua Myles:
Another thing I just noticed is that all of the failed jobs had an Elapsed time right around 30 minutes, some just under 30 but all under 31. I did rule out the (external) firewall timeout, which is 240 minutes, and am looking through OS and application timeouts for anything around 30 minutes.

Josh

On Tue, Jul 23, 2024 at 11:28 AM Joshua Myles <[email protected]> wrote:

    One of our Virtual Full jobs has been failing every day during
    Always Incremental consolidation, and I'm having trouble figuring
    out why.

    *list joblog jobid=18065
    Automatically selected Catalog: MyCatalog
    Using Catalog "MyCatalog"
     2024-07-23 09:01:22 bareos-dir-prod JobId 18065: Start Virtual
    Backup JobId 18065, Job=node0057-AI.2024-07-23_09.00.04_41
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Bootstrap
    records written to /var/lib/bareos/bareos-dir-prod.restore.45.bsr
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Consolidating
    JobIds 17646,13356,13422,13488,13556,13625,13694 containing
    2025684 files
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Connected
    Storage daemon at bareos-sd-t1-prod.foo.bar.edu:9103
    <http://bareos-sd-t1-prod.foo.bar.edu:9103>, encryption:
    TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065:  Encryption:
    TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device
    "FileStorage5" to read.
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Max configured
    use duration=72,000 sec. exceeded. Marking Volume
    "node0057-AI-Consolidated-12250" as Used.
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Created new
    Volume "node0057-AI-Consolidated-12276" in catalog.
     2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device
    "FileStorageConsolidated5" to write.
     2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Labeled new
    Volume "node0057-AI-Consolidated-12276" on device
    "FileStorageConsolidated5" (/var/lib/bareos/storage).
     2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Wrote label to
    prelabeled Volume "node0057-AI-Consolidated-12276" on device
    "FileStorageConsolidated5" (/var/lib/bareos/storage)
     2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Ready to read
    from volume "node0057-AI-Consolidated-6560" on device
    "FileStorage5" (/var/lib/bareos/storage).
     2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Forward
    spacing Volume "node0057-AI-Consolidated-6560" to file:block 0:274.
     2024-07-23 09:02:15 bareos-dir-prod JobId 18065: Insert of
    attributes batch table with 800001 entries start
     2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of
    attributes batch table done
     2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Fatal error:
    Director's comm line to SD dropped.
     2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of
    attributes batch table with 3303 entries start
     2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of
    attributes batch table done
     2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Replicating
    deleted files from jobids
    17646,13356,13422,13488,13556,13625,13694 to jobid 18065
     2024-07-23 09:03:03 bareos-dir-prod JobId 18065: Error: Bareos
    bareos-dir-prod 22.1.4 (28Feb24):
      Build OS:               Red Hat Enterprise Linux release 9.1 (Plow)
      JobId:                  18065
      Job:  node0057-AI.2024-07-23_09.00.04_41
      Backup Level:           Virtual Full
      Client:                 "node0057.foo.bar.edu-fd" 22.1.5
    (04Jun24) Red Hat Enterprise Linux Server release 7.9 (Maipo),redhat
      FileSet:                "LinuxAll" 2023-10-22 13:39:16
      Pool:                   "node0057-AI-Consolidated" (From Job
    Pool's NextPool resource)
      Catalog:                "MyCatalog" (From Client resource)
      Storage:                "bareos-sd-t1-prod-Consolidated" (From
    Storage from Pool's NextPool resource)
      Scheduled time:         23-Jul-2024 09:00:04
      Start time:             21-May-2024 02:00:01
      End time:               21-May-2024 02:30:31
      Elapsed time:           30 mins 30 secs
      Priority:               10
      SD Files Written:       0
      SD Bytes Written:       110,846,184 (110.8 MB)
      Rate:                   60.6 KB/s
      Volume name(s):         node0057-AI-Consolidated-12276
      Volume Session Id:      19
      Volume Session Time:    1721645047
      Last Volume Bytes:      275 (275 B)
      SD Errors:              0
      SD termination status:  Error
      Accurate:               yes
      Bareos binary info:     Bareos subscription release
      Job triggered by:       User
      Termination:            *** Backup Error ***

    *

    Director and SD are separate hosts, and this issue seems to
    persist only with jobs from this client, node0057. I enabled debug
    tracing on the SD but haven't seen anything that makes sense to me.

    ...
    bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write
    JobId=18065 FI=804499 SessId=19 Strm=1998 len=65
    bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write
    JobId=18065 FI=804499 SessId=19 Strm=MD5 len=16
    bareos-sd-t1-prod (100): stored/mac.cc:655-18065 ok=0
    bareos-sd-t1-prod (130): stored/label.cc:627-18065 session_label
    record=fc052df8
    bareos-sd-t1-prod (150): stored/label.cc:652-18065 Write
    sesson_label record JobId=18065 FI=EOS_LABEL SessId=19 Strm=18065
    len=234 remainder=0
    bareos-sd-t1-prod (150): stored/label.cc:660-18065 Leave
    WriteSessionLabel Block=390293886d File=0d
    bareos-sd-t1-prod (100): stored/block.cc:567-18065 return
    WriteBlockToDev, job is canceled
    bareos-sd-t1-prod (100): stored/mac.cc:684-18065 Set ok=FALSE
    after WriteBlockToDevice.
    bareos-sd-t1-prod (200): stored/mac.cc:687-18065 Flush block to
    device pos 0:390293886
    bareos-sd-t1-prod (100): stored/acquire.cc:538-18065 releasing
    device "FileStorageConsolidated5" (/var/lib/bareos/storage)
    bareos-sd-t1-prod (100): stored/acquire.cc:560-18065 There are 0
    writers in ReleaseDevice
    bareos-sd-t1-prod (50): stored/askdir.cc:366-18065 >dird UpdCat
    Job=node0057-AI.2024-07-23_09.00.04_41 FileAttributes
    bareos-sd-t1-prod (50): stored/askdir.cc:369-18065 create_jobmedia
    error BnetRecv
    bareos-sd-t1-prod (200): stored/mac.cc:229-18062 bareos-sd-t1-prod
    (200): stored/acquire.cc:568-18065 ===== Wrote block new pos
    2:4028935146
    bareos-sd-t1-prod (50): stored/askdir.cc:298-18065 Update cat
    VolBytes=390293887
    bareos-sd-t1-prod (50): stored/askdir.cc:317-18065 >dird
    bareos-sd-t1-prod (200): stored/acquire.cc:587-18065
    dir_update_vol_info. Release vol=node0057-AI-Consolidated-12276
    dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
    bareos-sd-t1-prod (150): stored/vol_mgr.cc:695-18065 === clear
    in_use vol=node0057-AI-Consolidated-12276
    bareos-sd-t1-prod (150): stored/vol_mgr.cc:712-18065 === set not
    reserved vol=node0057-AI-Consolidated-12276 num_writers=0
    dev_reserved=0 dev="FileStorageConsolidated5"
    (/var/lib/bareos/storage)
    bareos-sd-t1-prod (150): stored/vol_mgr.cc:740-18065 === clear
    in_use vol=node0057-AI-Consolidated-12276
    bareos-sd-t1-prod (150): stored/vol_mgr.cc:751-18065 === remove
    volume node0057-AI-Consolidated-12276
    dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
    ...

    I can provide client/job/pool/storage configurations if they seem
    relevant here, and am continuing to poke at this myself. Thanks
    for any thoughts on troubleshooting.

    Josh
-- You received this message because you are subscribed to the Google
    Groups "bareos-users" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected].
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com
    
<https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com?utm_medium=email&utm_source=footer>.



--
Joshua Myles
he/him
Senior Storage Administrator
Michigan Tech IT

We can help.
mtu.edu/it <https://mtu.edu/it>
906-369-1870
--
You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/CAE3eABeUCC_arNSNQ5FPsdR9yRHp2AZJOxx%2B69z7kNa-TiuxKg%40mail.gmail.com <https://groups.google.com/d/msgid/bareos-users/CAE3eABeUCC_arNSNQ5FPsdR9yRHp2AZJOxx%2B69z7kNa-TiuxKg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
 Sebastian [email protected]
 Bareos GmbH & Co. KG            Phone: +49 221 630693-0
 https://www.bareos.com
 Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
 Komplementär: Bareos Verwaltungs-GmbH
 Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz

--
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/de6dcac8-546a-4bab-ab01-3512b2ca7b90%40bareos.com.

Reply via email to