Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

David L.A. De Leeuw Sun, 20 Aug 2023 12:22:58 -0700

Hi all,

Apparently, this has nothing to do with SP at all !


The (Windows server 2019 on ESXI) system holding the containers just 
disconnects for 5 minutes ! 

No pings to the server. 

When access is restored, later on, a message appears in the events:
"The system time has changed to 2023-08-20T19:05:05 from 2023-08-20T19:01:04  "
This is no warning even, just "information". 

I have no idea why this should happen, but we will find it.
Thanks for your support !

David



-----Original Message-----
From: דוד דה ליאו 
Sent: Sunday, August 20, 2023 9:37 PM
To: ADSM: Dist Stor Manager <[email protected]>
Subject: RE: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hi Michael,

Thanks a lot.

The SP Server is not on VM, just the storage. I am not the manager to the 
server. 
Just got a lot of backup storage if we provide the space for the containers.

Sure we run a lot of sessions in parallel as you said. I will try a run 
according to your recommendations.
One other thought I am testing, is that over a year ago we also had crashes. 
The 10 Gb optical network had hickups. Our 1 Gb line worked fine.
I just switched back to the 1 Gb and see what happens. 

Will keep you posted !

David


-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Michael Prix
Sent: Sunday, August 20, 2023 9:04 PM
To: [email protected]
Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client

Hello David,

  an *SP-Server in a VM is not the best setup, but nevertheless it should work 
- and has proven so for the past.

For the client: Please show the dsm.opt. I suspect you are trunning several 
sessions from this client in parallel during a backup-> stop it for the moment.
Start with a basic dsm.opt, disable the option "resourceutilization", if set,  
and set "memoryefficient yes" (or "diskcachem" if you like). I'f it still 
crashes with a plain dsm.opt, you should open a ticket with IBM.

-- 
Michael Prix




August 20, 2023 at 7:25 PM, "David L.A. De Leeuw" <[email protected]> wrote:


> 
> Hi Chavdar and Michael,
> 
> Thanks for your thoughts and help.
> 
> I added "memoryefficientbackup". 
> 
> But still the sessions keep crashing. Once the session crashes, I get a whole 
> bit of errors for storage pool directories, and in fact the whole pool 
> becomes unavailable. 
> I run "update stgpooldir ... access=readwrite" and all is accessible again.
> Some of the containers are in unavailable state and need audit. 
> 
> Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, 
> 64 GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a 
> 10Gb Intel 82588 card.
> Nothing I can see points to a lack of resources.
> 
> Everything worked fine till 4 days ago. That is why I thought of a problem 
> with Windows updates, but as I rolled them back, that does not make sense.
> 
> I am quite at a loss where to look next ...
> 
> Thanks
> 
> David
> 
> [Server Side] .
> 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
> 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
>  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
>  certificate TSM Self-Signed Certificate. (SESSION:
>  197903)
> 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT)
>  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
>  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
> 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory:
>  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
>  CPOOL. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
>  0:83236375 was aborted. (SESSION: 197881)
> 20-08-2023 19:47:55 ANR0204I The container state for
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
>  ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION:
>  197883)
> 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or
>  writing to the container. Container
>  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
>  ncf in stgpool CPOOL has been marked as UNAVAILABLE and
>  should be audited to validate accessibility and content.
>  (SESSION: 197883)
> 
> [From the client side:]
> 
> During the incr of a large filespace:
> 
> Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
> ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
> ANS1311E Server out of data storage space
> 
> [I ran sel of the latest file. It failed because all containerdirs were 
> unavailable.]
> 
> ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 
> BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures.
> 
> Total number of objects inspected: 1
> Total number of objects backed up: 0
> Total number of objects updated: 0
> Total number of objects rebound: 0
> Total number of objects deleted: 0
> Total number of objects expired: 0
> Total number of objects failed: 1
>  ...
> Network data transfer rate: 148.306,35 KB/sec
> Aggregate data transfer rate: 211,50 KB/sec
> Objects compressed by: 0%
> Total data reduction ratio: 0.23%
> Subfile objects reduced by: 0%
> Elapsed processing time: 00:00:32
> ANS1311E Server out of data storage space
> 
> [Then I updated the containerdirs to readwrite and ran the selective backup. 
> No problem]
> -----------------------------------------------------------------------------------------------------------
> Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx'
> Selective Backup function invoked.
> 
> Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx [Sent]
> Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 BU - 
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder 
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and 
> general MRI data\For-Crop-T2W - coronal Copy.pptx' finished without failure.
> 
> -----Original Message-----
> From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Chavdar 
> Cholev
> Sent: Sunday, August 20, 2023 3:43 PM
> To: [email protected]
> Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client
> 
> Just to make sure that we are on the same page...
> You have TSM installed on VM running on VMware. This VM has few LUNs 
> presented and those LUN are used for containers?
> 
> Short in the dark:
> 1. Check VM resources if they are as IBM TSM blue print.
> 2. Check LUNs/HDDs response time in perf. monitor. The response time should 
> around 20-30 Ms during the backup operating.
> 3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)?
> 
> Thank you!
> Chavdar
> 
> On Saturday, August 19, 2023, David L.A. De Leeuw <[email protected]> wrote:
> 
> > 
> > Hi TSM experts,
> > 
> >  Our incr backup fails consistently in the last few days. It starts 
> >  alright but after a few gigabyte on the client we get the error:
> > 
> >  ANS1301E This operation cannot continue due to an error on the IBM 
> >  Spectrum Protect server. See your IBM Spectrum Protect server 
> >  administrator for assistance.
> > 
> >  On the server side we see:
> > 
> >  18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory:
> >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> >  CPOOL. (SESSION: 194578)
> >  18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for 
> >  node
> >  MEDFS2 (WinNT) - internal server error detected.
> >  (SESSION: 194578)
> >  18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory:
> >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> >  CPOOL. (SESSION: 194578)
> > 
> >  Then we find one or more containers unavailable. We fix the containers 
> >  with "audit container ... action=scanall"
> >  No errors are found. But the next backup will fail again.
> > 
> >  The server is on 8.1.17, the client as well.
> >  The containers are on a number of disks on a shared windows server 2019.
> >  There have been some updates on the windows server recently.
> >  (KB5029247,KB5029647)
> > 
> >  The audits are fine, data is accessible, but backups fail.
> >  Any ideas ?
> > 
> >  David de Leeuw
> >  Ben-Gurion University of the Negev
> >  Beer Sheva Israel
> >
>

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

Reply via email to