Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

Chavdar Cholev Mon, 21 Aug 2023 08:12:01 -0700

Hi David,
Just make sure that containers are excluded from anti-virus scan.


On Sunday, August 20, 2023, David L.A. De Leeuw <[email protected]> wrote:

> Hi all,
>
> Apparently, this has nothing to do with SP at all !
>
> The (Windows server 2019 on ESXI) system holding the containers just
> disconnects for 5 minutes !
>
> No pings to the server.
>
> When access is restored, later on, a message appears in the events:
> "The system time has changed to 2023-08-20T19:05:05 from
> 2023-08-20T19:01:04  "
> This is no warning even, just "information".
>
> I have no idea why this should happen, but we will find it.
> Thanks for your support !
>
> David
>
>
>
> -----Original Message-----
> From: דוד דה ליאו
> Sent: Sunday, August 20, 2023 9:37 PM
> To: ADSM: Dist Stor Manager <[email protected]>
> Subject: RE: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and
> client
>
> Hi Michael,
>
> Thanks a lot.
>
> The SP Server is not on VM, just the storage. I am not the manager to the
> server.
> Just got a lot of backup storage if we provide the space for the
> containers.
>
> Sure we run a lot of sessions in parallel as you said. I will try a run
> according to your recommendations.
> One other thought I am testing, is that over a year ago we also had
> crashes. The 10 Gb optical network had hickups. Our 1 Gb line worked fine.
> I just switched back to the 1 Gb and see what happens.
>
> Will keep you posted !
>
> David
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Michael
> Prix
> Sent: Sunday, August 20, 2023 9:04 PM
> To: [email protected]
> Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and
> client
>
> Hello David,
>
>   an *SP-Server in a VM is not the best setup, but nevertheless it should
> work - and has proven so for the past.
>
> For the client: Please show the dsm.opt. I suspect you are trunning
> several sessions from this client in parallel during a backup-> stop it for
> the moment.
> Start with a basic dsm.opt, disable the option "resourceutilization", if
> set,  and set "memoryefficient yes" (or "diskcachem" if you like). I'f it
> still crashes with a plain dsm.opt, you should open a ticket with IBM.
>
> --
> Michael Prix
>
>
>
>
> August 20, 2023 at 7:25 PM, "David L.A. De Leeuw" <[email protected]> wrote:
>
>
> >
> > Hi Chavdar and Michael,
> >
> > Thanks for your thoughts and help.
> >
> > I added "memoryefficientbackup".
> >
> > But still the sessions keep crashing. Once the session crashes, I get a
> whole bit of errors for storage pool directories, and in fact the whole
> pool becomes unavailable.
> > I run "update stgpooldir ... access=readwrite" and all is accessible
> again.
> > Some of the containers are in unavailable state and need audit.
> >
> > Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's
> allocated, 64 GB memory, 110 TB disk. The disks are declared as VMDKs.
> Network is on a 10Gb Intel 82588 card.
> > Nothing I can see points to a lack of resources.
> >
> > Everything worked fine till 4 days ago. That is why I thought of a
> problem with Windows updates, but as I rolled them back, that does not make
> sense.
> >
> > I am quite at a loss where to look next ...
> >
> > Thanks
> >
> > David
> >
> > [Server Side] .
> > 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2
> (WinNT)
> >  (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on
> >  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902)
> > 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol
> >  TLSV13, cipher specification TLS_AES_256_GCM_SHA384,
> >  certificate TSM Self-Signed Certificate. (SESSION:
> >  197903)
> > 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2
> (WinNT)
> >  (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on
> >  STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903)
> > 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool
> directory:
> >  \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool:
> >  CPOOL. (SESSION: 197881)
> > 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction
> >  0:83236375 was aborted. (SESSION: 197881)
> > 20-08-2023 19:47:55 ANR0204I The container state for
> >  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
> >  ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION:
> >  197883)
> > 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening
> or
> >  writing to the container. Container
> >  \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.-
> >  ncf in stgpool CPOOL has been marked as UNAVAILABLE and
> >  should be audited to validate accessibility and content.
> >  (SESSION: 197883)
> >
> > [From the client side:]
> >
> > During the incr of a large filespace:
> >
> > Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU -
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful **
> > ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU -
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx' failed.
> > ANS1311E Server out of data storage space
> >
> > [I ran sel of the latest file. It failed because all containerdirs were
> unavailable.]
> >
> > ANS1804E Selective Backup processing of 
> > '\\medfs2\e$\medusers14\angel\17.8.23
> BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with
> failures.
> >
> > Total number of objects inspected: 1
> > Total number of objects backed up: 0
> > Total number of objects updated: 0
> > Total number of objects rebound: 0
> > Total number of objects deleted: 0
> > Total number of objects expired: 0
> > Total number of objects failed: 1
> >  ...
> > Network data transfer rate: 148.306,35 KB/sec
> > Aggregate data transfer rate: 211,50 KB/sec
> > Objects compressed by: 0%
> > Total data reduction ratio: 0.23%
> > Subfile objects reduced by: 0%
> > Elapsed processing time: 00:00:32
> > ANS1311E Server out of data storage space
> >
> > [Then I updated the containerdirs to readwrite and ran the selective
> backup. No problem]
> > ------------------------------------------------------------
> -----------------------------------------------
> > Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU -
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx'
> > Selective Backup function invoked.
> >
> > Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU -
> E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx [Sent]
> > Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 BU
> - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder
> 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI
> and general MRI data\For-Crop-T2W - coronal Copy.pptx' finished without
> failure.
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of
> Chavdar Cholev
> > Sent: Sunday, August 20, 2023 3:43 PM
> > To: [email protected]
> > Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and
> client
> >
> > Just to make sure that we are on the same page...
> > You have TSM installed on VM running on VMware. This VM has few LUNs
> presented and those LUN are used for containers?
> >
> > Short in the dark:
> > 1. Check VM resources if they are as IBM TSM blue print.
> > 2. Check LUNs/HDDs response time in perf. monitor. The response time
> should around 20-30 Ms during the backup operating.
> > 3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)?
> >
> > Thank you!
> > Chavdar
> >
> > On Saturday, August 19, 2023, David L.A. De Leeuw <[email protected]>
> wrote:
> >
> > >
> > > Hi TSM experts,
> > >
> > >  Our incr backup fails consistently in the last few days. It starts
> > >  alright but after a few gigabyte on the client we get the error:
> > >
> > >  ANS1301E This operation cannot continue due to an error on the IBM
> > >  Spectrum Protect server. See your IBM Spectrum Protect server
> > >  administrator for assistance.
> > >
> > >  On the server side we see:
> > >
> > >  18-08-2023 22:57:25 ANR2012W Error encountered for storage pool
> directory:
> > >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> > >  CPOOL. (SESSION: 194578)
> > >  18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578
> for
> > >  node
> > >  MEDFS2 (WinNT) - internal server error detected.
> > >  (SESSION: 194578)
> > >  18-08-2023 22:57:26 ANR2012W Error encountered for storage pool
> directory:
> > >  \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool:
> > >  CPOOL. (SESSION: 194578)
> > >
> > >  Then we find one or more containers unavailable. We fix the
> containers
> > >  with "audit container ... action=scanall"
> > >  No errors are found. But the next backup will fail again.
> > >
> > >  The server is on 8.1.17, the client as well.
> > >  The containers are on a number of disks on a shared windows server
> 2019.
> > >  There have been some updates on the windows server recently.
> > >  (KB5029247,KB5029647)
> > >
> > >  The audits are fine, data is accessible, but backups fail.
> > >  Any ideas ?
> > >
> > >  David de Leeuw
> > >  Ben-Gurion University of the Negev
> > >  Beer Sheva Israel
> > >
> >
>

Re: INCR backups fail ! TSM 8.1.17 Windows Server and client

Reply via email to