Hello David, an *SP-Server in a VM is not the best setup, but nevertheless it should work - and has proven so for the past.
For the client: Please show the dsm.opt. I suspect you are trunning several sessions from this client in parallel during a backup-> stop it for the moment. Start with a basic dsm.opt, disable the option "resourceutilization", if set, and set "memoryefficient yes" (or "diskcachem" if you like). I'f it still crashes with a plain dsm.opt, you should open a ticket with IBM. -- Michael Prix August 20, 2023 at 7:25 PM, "David L.A. De Leeuw" <da...@bgu.ac.il> wrote: > > Hi Chavdar and Michael, > > Thanks for your thoughts and help. > > I added "memoryefficientbackup". > > But still the sessions keep crashing. Once the session crashes, I get a whole > bit of errors for storage pool directories, and in fact the whole pool > becomes unavailable. > I run "update stgpooldir ... access=readwrite" and all is accessible again. > Some of the containers are in unavailable state and need audit. > > Our container storage is on a Dell PowerEdge R730xd, has 24 CPU's allocated, > 64 GB memory, 110 TB disk. The disks are declared as VMDKs. Network is on a > 10Gb Intel 82588 card. > Nothing I can see points to a lack of resources. > > Everything worked fine till 4 days ago. That is why I thought of a problem > with Windows updates, but as I rolled them back, that does not make sense. > > I am quite at a loss where to look next ... > > Thanks > > David > > [Server Side] . > 20-08-2023 19:47:22 ANR0839I Session 197902 started for node MEDFS2 (WinNT) > (SSL medspice.bgu.ac.il[132.72.73.246]:53184) on > STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197902) > 20-08-2023 19:47:26 ANR8592I Session 197903 connection is using protocol > TLSV13, cipher specification TLS_AES_256_GCM_SHA384, > certificate TSM Self-Signed Certificate. (SESSION: > 197903) > 20-08-2023 19:47:26 ANR0839I Session 197903 started for node MEDFS2 (WinNT) > (SSL medspice.bgu.ac.il[132.72.73.246]:53185) on > STOREWARE13.auth.ad.bgu.ac.il:1502. (SESSION: 197903) > 20-08-2023 19:47:55 ANR2012W Error encountered for storage pool directory: > \\medbackup.med.ad.bgu.ac.il\tsmc20 in storage pool: > CPOOL. (SESSION: 197881) > 20-08-2023 19:47:55 ANR1181E sdtxn.c(1404): Data storage transaction > 0:83236375 was aborted. (SESSION: 197881) > 20-08-2023 19:47:55 ANR0204I The container state for > \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.- > ncf is updated from AVAILABLE to UNAVAILABLE. (SESSION: > 197883) > 20-08-2023 19:47:55 ANR3660E An unexpected error occurred while opening or > writing to the container. Container > \\medbackup.med.ad.bgu.ac.il\tsmc17\18\0000000000001853.- > ncf in stgpool CPOOL has been marked as UNAVAILABLE and > should be audited to validate accessibility and content. > (SESSION: 197883) > > [From the client side:] > > During the incr of a large filespace: > > Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - > E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx ** Unsuccessful ** > ANS1228E Sending of object '\\medfs2\e$\medusers14\angel\17.8.23 BU - > E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx' failed. > ANS1311E Server out of data storage space > > [I ran sel of the latest file. It failed because all containerdirs were > unavailable.] > > ANS1804E Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 > BU - E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx' finished with failures. > > Total number of objects inspected: 1 > Total number of objects backed up: 0 > Total number of objects updated: 0 > Total number of objects rebound: 0 > Total number of objects deleted: 0 > Total number of objects expired: 0 > Total number of objects failed: 1 > ... > Network data transfer rate: 148.306,35 KB/sec > Aggregate data transfer rate: 211,50 KB/sec > Objects compressed by: 0% > Total data reduction ratio: 0.23% > Subfile objects reduced by: 0% > Elapsed processing time: 00:00:32 > ANS1311E Server out of data storage space > > [Then I updated the containerdirs to readwrite and ran the selective backup. > No problem] > ----------------------------------------------------------------------------------------------------------- > Protect> sel '\\medfs2\e$\medusers14\angel\17.8.23 BU - > E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx' > Selective Backup function invoked. > > Normal File--> 7.132.827 \\medfs2\e$\medusers14\angel\17.8.23 BU - > E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx [Sent] > Selective Backup processing of '\\medfs2\e$\medusers14\angel\17.8.23 BU - > E\MyDocs(E)-PrevOLD-D\MyDocs (D)\PERSON-CRITER\FAMILY\OMRI's folder > 313843070\OMRI 1-16 medical issue\MRIs - CTs - OMRI\MY PROCESSING of MRI and > general MRI data\For-Crop-T2W - coronal Copy.pptx' finished without failure. > > -----Original Message----- > From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Chavdar > Cholev > Sent: Sunday, August 20, 2023 3:43 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] INCR backups fail ! TSM 8.1.17 Windows Server and client > > Just to make sure that we are on the same page... > You have TSM installed on VM running on VMware. This VM has few LUNs > presented and those LUN are used for containers? > > Short in the dark: > 1. Check VM resources if they are as IBM TSM blue print. > 2. Check LUNs/HDDs response time in perf. monitor. The response time should > around 20-30 Ms during the backup operating. > 3. Do you know if those HDDd for LUNs are .vmdk or RDM (raw device map)? > > Thank you! > Chavdar > > On Saturday, August 19, 2023, David L.A. De Leeuw <da...@bgu.ac.il> wrote: > > > > > Hi TSM experts, > > > > Our incr backup fails consistently in the last few days. It starts > > alright but after a few gigabyte on the client we get the error: > > > > ANS1301E This operation cannot continue due to an error on the IBM > > Spectrum Protect server. See your IBM Spectrum Protect server > > administrator for assistance. > > > > On the server side we see: > > > > 18-08-2023 22:57:25 ANR2012W Error encountered for storage pool directory: > > \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool: > > CPOOL. (SESSION: 194578) > > 18-08-2023 22:57:25 ANR0530W Transaction failed for session 194578 for > > node > > MEDFS2 (WinNT) - internal server error detected. > > (SESSION: 194578) > > 18-08-2023 22:57:26 ANR2012W Error encountered for storage pool directory: > > \\medbackup.med.ad.bgu.ac.il\tsmc1 in storage pool: > > CPOOL. (SESSION: 194578) > > > > Then we find one or more containers unavailable. We fix the containers > > with "audit container ... action=scanall" > > No errors are found. But the next backup will fail again. > > > > The server is on 8.1.17, the client as well. > > The containers are on a number of disks on a shared windows server 2019. > > There have been some updates on the windows server recently. > > (KB5029247,KB5029647) > > > > The audits are fine, data is accessible, but backups fail. > > Any ideas ? > > > > David de Leeuw > > Ben-Gurion University of the Negev > > Beer Sheva Israel > > >