On 03.11.20 14:46, Grzegorz Powiedziuk wrote:
> Hi, I could use some ideas. We moved a huge db2 from old p7 aix to rhel7 on
> Z and we are having big performance issues.
> Same memory, CPU number is down from 12 to 10.  Although they had
> multithreading ON so they saw more "cpus" We have faster disks (moved to
> flash), faster FCP cards and faster network adapters.
> We are running on z114 and at this point that is practically the only VM
> running with IFLs on this box.
>
> It seems that when "jobs" run on their own, they finish faster than what
> they were getting on AIX.
> But problems start if there is more than we can chew. So either few jobs
> running at the same time or some reorgs running in the database.
>
> Load average goes to 150-200, cpus are at 100%  (kernel time can go to
> 20-30% ) but no iowaits.
> Plenty of memory available.
> At this point everything becomes extremely slow, people are starting having
> problems with connecting to db2 (annd sshing), basically it becomes a
> nightmare
>
> This db2 is massive (30+TB) and it is a multinode configuration (17 nodes
> running on the same host). We moved it like this 1:1 from that old AIX.
>
> DB2 is running on the ext4 filesystem (Actually a huge number of
> filesystems- each NODE is a separate logical volume). Separate for logs,
> data.
>
> If this continues like this, we will add 2 cpus but I have a feeling that
> it will not make much difference.
>
> I know that we end up with a massive number of processes and a massive
> number of file descriptors (lsof sice it shows also threads now, is
> practically useless - it would run for way too long - 10-30 minutes
> probably) .
>
> A snapshot from just now:
>
> top - 08:37:50 up 11 days, 12:04, 28 users,  load average: 188.29, 151.07,
> 133.54
> Tasks: 1843 total,  11 running, 1832 sleeping,   0 stopped,   0 zombie
> %Cpu0  : 76.3 us, 16.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.0 hi,  3.2 si,
>  2.9 st
> %Cpu1  : 66.1 us, 31.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.3 si,
>  0.6 st
> %Cpu2  : 66.9 us, 31.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi,  1.3 si,
>  0.3 st
> %Cpu3  : 74.7 us, 23.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi,  1.3 si,
>  0.3 st
> %Cpu4  : 86.7 us, 10.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.3 si,
>  0.6 st
> %Cpu5  : 83.8 us, 13.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.6 si,
>  0.3 st
> %Cpu6  : 81.6 us, 15.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.9 si,
>  0.6 st
> %Cpu7  : 70.6 us, 26.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.9 si,
>  0.6 st
> %Cpu8  : 70.5 us, 26.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.6 hi,  1.6 si,
>  0.6 st
> %Cpu9  : 84.1 us, 13.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi,  1.3 si,
>  0.6 st
> KiB Mem : 15424256+total,  1069280 free, 18452168 used, 13472112+buff/cache
> KiB Swap: 52305904 total, 51231216 free,  1074688 used. 17399028 avail Mem

So you at least had some time where you paged out memory.
If you have sysstat installed it would be good to get some history data of
cpu and swap.

You can also run "vmstat 1 -w" to get an online you on the system load.
Can you also check (as root)
/sys/kernel/debug/diag_stat
2 times and see if you see excessive diagnose 9c rates.

>
> Where  can I look for potential relief? Everyone was hoping for a better
> performance not worse.I am hoping that there is something we can tweak to
> make this better.
> I will appreciate any ideas!

I agree this should have gotten faster, not slower.

If you have an IBM service contract (or any other vendor that provides support)
you could open a service ticket to get this analysed.

Christian

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to