Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

David L.A. De Leeuw Sun, 20 Aug 2023 10:25:40 -0700

Hi Chavdar and Michael,

Thanks for your thoughts and help.

I added "memoryefficientbackup". 

But still the sessions keep crashing. Once the session crashes, I get a whole 
bit of errors   for storage pool directories, and in fact the whole pool 
becomes unavailable. 
I run "update stgpooldir ... access=readwrite" and all is accessible again.
Some of the containers are in unavailable state and need audit. 

Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 64 
GB memory, 110 TB disk.  The disks are declared as VMDKs.  Network is on a 10Gb 
Intel 82588 card.
Nothing I can see points to a lack of resources.

Everything worked fine till 4 days ago. That is why I thought of a problem with 
Windows updates, but as I rolled them back, that does not make sense.

I am quite at a loss where to look next ...

Thanks

David

[Server Side] .
20-08-2023 19:47:22      ANR0839I Session 197902 started for node MEDFS2 (WinNT)
                          (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
                          STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
20-08-2023 19:47:26      ANR8592I Session 197903 connection is using protocol
                          TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
                          certificate TSM Self-Signed Certificate. (SESSION:
                          197903)
20-08-2023 19:47:26      ANR0839I Session 197903 started for node MEDFS2 (WinNT)
                          (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
                          STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
20-08-2023 19:47:55      ANR2012W Error encountered for storage pool directory:
                          \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
                          CPOOL. (SESSION: 197881)
20-08-2023 19:47:55      ANR1181E sdtxn.c(1404): Data storage transaction
                          0:83236375 was aborted. (SESSION: 197881)
20-08-2023 19:47:55      ANR0204I The container state for

\\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
                          ncf is updated from AVAILABLE to UNAVAILABLE. 
(SESSION:
                          197883)
20-08-2023 19:47:55      ANR3660E An unexpected error occurred while opening or
                          writing to the container. Container

\\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
                          ncf in stgpool CPOOL has been marked as UNAVAILABLE 
and
                          should be audited to validate accessibility and 
content.
                           (SESSION: 197883)

[From the client side:]

During the incr of a large filespace:

Normal File-->         7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx  ** Unsuccessful **
ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
ANS1311E Server out of data storage space

[I ran sel of the latest file. It failed because all containerdirs were 
unavailable.]

ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 
BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.

Total number of objects inspected:            1
Total number of objects backed up:            0
Total number of objects updated:              0
Total number of objects rebound:              0
Total number of objects deleted:              0
Total number of objects expired:              0
Total number of objects failed:               1
 ...
Network data transfer rate:          148.306,35 KB/sec
Aggregate data transfer rate:            211,50 KB/sec
Objects compressed by:                        0%
Total data reduction ratio:                0.23%
Subfile objects reduced by:                   0%
Elapsed processing time:               00:00:32
ANS1311E Server out of data storage space

[Then I updated the containerdirs to readwrite and ran the selective backup. No 
problem]
-----------------------------------------------------------------------------------------------------------
Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx'
Selective Backup function invoked.

Normal File-->         7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx [Sent]
Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
general MRI data\For-Crop-T2W - coronal Copy.pptx' finished without failure.

-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Chavdar Cholev
Sent: Sunday, August 20, 2023 3:43 PM
To: [email protected]
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Just to make sure that we are on the same page...
You have TSM installed on VM running on VMware. This VM has few LUNs presented 
and those LUN are used for containers?

Short in the dark:
1. Check VM resources if they are as IBM TSM blue print.
2. Check LUNs/HDDs response time in perf. monitor. The response time should 
around 20-30 Ms during the backup operating.
3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)?

Thank you!
Chavdar

On Saturday, August 19, 2023, David L.A. De Leeuw <[email protected]> wrote:

> Hi TSM experts,
>
> Our incr backup fails consistently in the last few days. It starts 
> alright but after a few gigabyte on the client we get the error:
>
> ANS1301E This operation cannot continue due to an error on the IBM 
> Spectrum Protect server. See your IBM Spectrum Protect server 
> administrator for assistance.
>
> On the server side we see:
>
> 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
> 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for 
> node
> MEDFS2 (WinNT) - internal server error detected.
> (SESSION: 194578)
> 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> CPOOL. (SESSION: 194578)
>
>
> Then we find one or more containers unavailable. We fix the containers 
> with "audit container ... action=scanall"
> No errors are found. But the next backup will fail again.
>
> The server is on 8.1.17, the client as well.
> The containers are on a number of disks on a shared windows server 2019.
> There have been some updates on the windows server recently.
> (KB5029247,KB5029647)
>
> The audits are fine, data is accessible, but backups fail.
> Any ideas ?
>
> David de Leeuw
> Ben-Gurion University of the Negev
> Beer Sheva Israel
>
>

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

Reply via email to