Another point of interest is the archlog filesystem. We originally had it at 300GB but kept constantly overflowing & crashing since the DB backups that trigger at 80% wouldn't finish (>5-hours) before it reached 100%. So we recently increased it to 1TB. Now, the last DBbackup has been running for >24-hours and I have been sitting here watching the archlog filesystem %used go from 80% to now 38%. It is taking a long, long time to empty it, even with nothing running but the DBBackup. With nothing but the DBBackup (and archlog flushing) running, the load average is still >25.
I really think the additional memory is killing this box. It was never this slow or overloaded before! On Wed, Jul 26, 2017 at 8:26 AM, Stefan Folkerts <[email protected]> wrote: > Oh, I just now read the 16 threads correctly, I was thinking you wrote 16 > cores! > 8 cores is far below specification if your running M-size blueprint ingest > figures. > I've seen 16 core intel servers (2016 spec xeon CPU's) go up to 70% > utilization so that kind of load would never work on 8 cores, but again, I > don't know how much managed data you have and what your ingest figures are. > > > On Wed, Jul 26, 2017 at 2:02 PM, Zoltan Forray <[email protected]> wrote: > > > I kinda feel the same way since my networking folks say it isn't the 10G > > links (Xymon shows peaks of 2Gb), eventhough at it's peak processing load > > it would be handling 5-TSM servers sending replications across the same > 10G > > links also used for the NFS. > > > > If the current processes ever finish (delete of 9M objects is now into > > 48-hours, I will let the server sit for a day-or-two to see if it > > improves. I have noticed that even with the server idle (no processes or > > sessions), the CPU load-average was still higher than the 16-threads > > available. I am seriously thinking about going back to the original 96GB > > of RAM since it seems a lot of this slowdown started after bumping to > > 192GB. > > > > On Wed, Jul 26, 2017 at 3:16 AM, Stefan Folkerts < > > [email protected]> > > wrote: > > > > > Interesting, why would NFS be the problem if the deletion of objects > > > doesn't really touch the storagepools? > > > > > > I would wager that a straight up dd on the system to create a large > file > > > via 10Gb/s on NFS would be blazing fast but the database backup is slow > > > because it's almost never idle, it's always behind it's intern > processes > > > such as reorgs. > > > > > > place your bets! :-) > > > > > > http://www.strawpoll.me/13536369 > > > > > > > > > On Mon, Jul 24, 2017 at 3:55 PM, Sasa Drnjevic <[email protected]> > > > wrote: > > > > > > > Not sure of course...But, I would blame NFS > > > > > > > > Did you check the negotiated speed of your NFS eth 10G ifaces? > > > > And that network? > > > > > > > > Regards, > > > > > > > > -- > > > > Sasa Drnjevic > > > > www.srce.unizg.hr > > > > > > > > > > > > On 24.7.2017. 15:49, Zoltan Forray wrote: > > > > > 8-cores/16-threads. It wasn't bad when it was replicating from > > > 4-SP/TSM > > > > > servers. We had to stop all replication due to running out of > space > > > and > > > > > until I finish this cleanup, I have been holding off replication. > > So, > > > > the > > > > > deletion has been running standalone. > > > > > > > > > > I forgot to mention that DB backups are also running very long. > > 1.5TB > > > DB > > > > > backup runs 8+hours to NFS storage. These are connected via 10G. > > > > > > > > > > On Mon, Jul 24, 2017 at 9:41 AM, Sasa Drnjevic < > > [email protected]> > > > > > wrote: > > > > > > > > > >> On 24.7.2017. 15:25, Zoltan Forray wrote: > > > > >>> Due to lack of resources, we have had to stop replication on one > of > > > our > > > > >> SP > > > > >>> servers. The replication target server is 7.1.6.3 RHEL 7, Dell > T710 > > > > with > > > > >>> 192GB RAM. NFS/ISILON storage. > > > > >>> > > > > >>> After removing replication from the nodes on source server, I > have > > > been > > > > >>> cleaning up the replication server by deleting the filespaces for > > the > > > > >> nodes > > > > >>> we are no longer replicating. > > > > >>> > > > > >>> My issue is the delete filespaces on the replication server is > > taking > > > > >>> forever. It took over a week to delete one filespace with > > 31-million > > > > >>> objects? > > > > >> > > > > >> > > > > >> That is definitely tooooo loooong :-( > > > > >> > > > > >> It would take 6-8 hrs max, in my environment even under "standard" > > > > load... > > > > >> > > > > >> How many CPU cores does it have? > > > > >> > > > > >> And how is/was it performing the role of a target repl. server > > > > >> performance wise? > > > > >> > > > > >> Regards, > > > > >> > > > > >> -- > > > > >> Sasa Drnjevic > > > > >> www.srce.unizg.hr > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >>> > > > > >>> To me it is highly unusual to take this long. Your thoughts on > > this? > > > > >>> > > > > >>> -- > > > > >>> *Zoltan Forray* > > > > >>> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > > >>> Xymon Monitor Administrator > > > > >>> VMware Administrator > > > > >>> Virginia Commonwealth University > > > > >>> UCC/Office of Technology Services > > > > >>> www.ucc.vcu.edu > > > > >>> [email protected] - 804-828-4807 > > > > >>> Don't be a phishing victim - VCU and other reputable > organizations > > > will > > > > >>> never use email to request that you reply with your password, > > social > > > > >>> security number or confidential personal information. For more > > > details > > > > >>> visit http://infosecurity.vcu.edu/phishing.html > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > *Zoltan Forray* > > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > > > Xymon Monitor Administrator > > > > > VMware Administrator > > > > > Virginia Commonwealth University > > > > > UCC/Office of Technology Services > > > > > www.ucc.vcu.edu > > > > > [email protected] - 804-828-4807 > > > > > Don't be a phishing victim - VCU and other reputable organizations > > will > > > > > never use email to request that you reply with your password, > social > > > > > security number or confidential personal information. For more > > details > > > > > visit http://infosecurity.vcu.edu/phishing.html > > > > > > > > > > > > > > > > > > > > -- > > *Zoltan Forray* > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > Xymon Monitor Administrator > > VMware Administrator > > Virginia Commonwealth University > > UCC/Office of Technology Services > > www.ucc.vcu.edu > > [email protected] - 804-828-4807 > > Don't be a phishing victim - VCU and other reputable organizations will > > never use email to request that you reply with your password, social > > security number or confidential personal information. For more details > > visit http://infosecurity.vcu.edu/phishing.html > > > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu [email protected] - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://infosecurity.vcu.edu/phishing.html
