Hi Olaf,
Thanks and sorry for reply so long. We've been testing several ways to provide this information for the user. Let me give you more details about that.
There's a corporate SASGRID consolidating several SAS applications from several business areas. All of them are using the same saswork filesystem. So the idea is to provide a way to identify the top processes or users that are doing more I/O in terms of throughput or IOPS. We have tested the following:
- fileheat and policy engine to identify most active files: We first activated fileheat by executing the command
# mmchconfig fileHeatLossPercent=25,fileHeatPeriodMinutes=720
After that the SAS admin started to run a job and the we created the following policy to see if we could detect the corresponding SAS file:
rule 'fileheatlist' list 'hotfiles' weight(FILE_HEAT)
SHOW( HEX( XATTR( 'gpfs.FileHeat' )) ||
' A=' || varchar(ACCESS_TIME) ||
' K=' || varchar(KB_ALLOCATED) ||
' H=' || varchar(FILE_HEAT) ||
' U=' || varchar(USER_ID) ||
' G=' || varchar(GROUP_ID) ||
' FZ=' || varchar(FILE_SIZE) ||
' CT=' || varchar(CREATION_TIME) ||
' CHT=' || varchar(CHANGE_TIME) ||
' M=' || varchar(MODIFICATION_TIME) )
SHOW( HEX( XATTR( 'gpfs.FileHeat' )) ||
' A=' || varchar(ACCESS_TIME) ||
' K=' || varchar(KB_ALLOCATED) ||
' H=' || varchar(FILE_HEAT) ||
' U=' || varchar(USER_ID) ||
' G=' || varchar(GROUP_ID) ||
' FZ=' || varchar(FILE_SIZE) ||
' CT=' || varchar(CREATION_TIME) ||
' CHT=' || varchar(CHANGE_TIME) ||
' M=' || varchar(MODIFICATION_TIME) )
where FILE_HEAT != 0.0
Then, we executed the command:
# mmapplypolicy -P policy-file-heat.txt -I defer -f test1
I don't know why, but always was reporting that zero files were selected. I don't know what´s missing or if that's the way it is.
- Combine mmdiag with a list of files generated by ILM engine: For we get busiest files we executed the following command:
# mmdiag --iohist verbose > mmdiag--iohist_verbose.out
One way to list the top files was this:
# cat mmdiag--iohist_verbose.out | grep data | awk '{print $10}' | uniq -c | sort -nr | head
7 135003
5 135003
3 135003
2 134985
2 134985
1 64171
1 64094
1 64013
1 46465
1 46465
7 135003
5 135003
3 135003
2 134985
2 134985
1 64171
1 64094
1 64013
1 46465
1 46465
Another one was executing the following command:
# cat mmdiag--iohist_verbose.out | grep data | sort -k6 -nr | head
03:12:11.911813 W data 2:132768 8 11.782 cli 0AC3C23C:58AEDD53 10.195.194.60 451799 0 Sync SyncFSWorkerThread
03:12:10.927003 W data 1:5410160 8 11.086 cli 0AC3C23C:58091F75 10.195.194.60 46465 1319 Sync SyncFSWorkerThread
03:12:11.927521 W data 2:113995072 8 7.602 cli 0AC3C23C:58AEDD53 10.195.194.60 451776 1 Sync SyncFSWorkerThread
03:12:10.999507 W data 2:149912432 24 3.830 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Sync SyncFSWorkerThread
03:12:20.190427 W data 1:40854976 8 3.058 cli 0AC3C23C:58091F75 10.195.194.60 64013 0 Sync SyncFSWorkerThread
03:12:11.923742 W data 2:182741840 8 3.036 cli 0AC3C23C:58AEDD53 10.195.194.60 385976 0 Sync SyncFSWorkerThread
03:12:20.186045 W data 1:41352672 16 2.451 cli 0AC3C23C:58091F1B 10.195.194.60 451774 2 Sync SyncFSWorkerThread
03:12:16.139833 W data 2:149912416 24 1.595 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Cleaner CleanBufferThread
03:12:21.544674 W data 3:146654840 8 0.873 cli 0AC3C23C:592334F8 10.195.194.60 451780 0 Sync SyncFSWorkerThread
03:12:10.998636 W data 2:149912352 8 0.833 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Sync SyncFSWorkerThread
03:12:11.911813 W data 2:132768 8 11.782 cli 0AC3C23C:58AEDD53 10.195.194.60 451799 0 Sync SyncFSWorkerThread
03:12:10.927003 W data 1:5410160 8 11.086 cli 0AC3C23C:58091F75 10.195.194.60 46465 1319 Sync SyncFSWorkerThread
03:12:11.927521 W data 2:113995072 8 7.602 cli 0AC3C23C:58AEDD53 10.195.194.60 451776 1 Sync SyncFSWorkerThread
03:12:10.999507 W data 2:149912432 24 3.830 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Sync SyncFSWorkerThread
03:12:20.190427 W data 1:40854976 8 3.058 cli 0AC3C23C:58091F75 10.195.194.60 64013 0 Sync SyncFSWorkerThread
03:12:11.923742 W data 2:182741840 8 3.036 cli 0AC3C23C:58AEDD53 10.195.194.60 385976 0 Sync SyncFSWorkerThread
03:12:20.186045 W data 1:41352672 16 2.451 cli 0AC3C23C:58091F1B 10.195.194.60 451774 2 Sync SyncFSWorkerThread
03:12:16.139833 W data 2:149912416 24 1.595 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Cleaner CleanBufferThread
03:12:21.544674 W data 3:146654840 8 0.873 cli 0AC3C23C:592334F8 10.195.194.60 451780 0 Sync SyncFSWorkerThread
03:12:10.998636 W data 2:149912352 8 0.833 cli 0AC3C23C:58AEDC8D 10.195.194.60 134985 4 Sync SyncFSWorkerThread
For we discover which filesystem that inode number belongs:
# mmlsnsd -L | grep 58AEDD53
sasconfig nsdconfig0001 0AC3C23C58AEDD53 host1,host2
sasconfig nsdconfig0001 0AC3C23C58AEDD53 host1,host2
Then we could run a policy rule to just list the files, here is the policy:
rule 'fileheatlist' list 'hotfiles' weight(FILE_HEAT))
show( ' U=' || varchar(USER_ID) ||
' G=' || varchar(GROUP_ID) ||
' A=' || varchar(ACCESS_TIME) ||
' K=' || varchar(KB_ALLOCATED) ||
' H=' || varchar(computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)) ||
' FZ=' || varchar(FILE_SIZE) ||
' CT=' || varchar(CREATION_TIME) ||
' CHT=' || varchar(CHANGE_TIME) ||
' M=' || varchar(MODIFICATION_TIME) )
show( ' U=' || varchar(USER_ID) ||
' G=' || varchar(GROUP_ID) ||
' A=' || varchar(ACCESS_TIME) ||
' K=' || varchar(KB_ALLOCATED) ||
' H=' || varchar(computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)) ||
' FZ=' || varchar(FILE_SIZE) ||
' CT=' || varchar(CREATION_TIME) ||
' CHT=' || varchar(CHANGE_TIME) ||
' M=' || varchar(MODIFICATION_TIME) )
# mmapplypolicy sasconfig -P policy-file-heat3.txt -I defer -f teste6
Then we could grep by inode number and see which file it is:
# grep "^451799 " teste6.list.hotfiles
For privacy reasons I won't show the result but it found the file. The good thing this list also provides the UID and GID of the file. We still waiting a feedback from SAS admin to see it's acceptable.
- dstat with --gpfs-ops --top-io-adv|--top-bio|--top-io: The problem is it only shows one process. That's not enough.
- Systemtap: It didn't work. I think it's because there's no GPFS symbols. If somebody know how to add GPFS symbols that can be very handy.
- QOS: We first enabled QOS to just collect filesystem statistics:
# mmchqos saswork --enable --fine-stats 60 --pid-stats yes
The the SAS admin started another SAS job and got the PID. Then we run the following command:
# mmlsqos saswork --fine-stats 2 --seconds 60 | grep SASPID
We never matched the PIDs. When you get from ps -ef | grep nodms, it return a PID of 5 digits and mmlsqos gives PIDs of 8 digits. We have a ticket opended to understand what's happening.
After all this time trying to figure out a way to generate this report, I think the problem is more complex. Even if we get this information what we could do to put a limit in those processes? I think the best option would have AIX servers running WLM and the saswork filesystems would need to be local on each server. In that way we not only could monitor but define classes, shares and limits for I/O. I think RedHat or Linux in general doesn't have a workload manager like in AIX.
| Abraços / Regards / Saludos,
Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team – IBM Systems Lab Services |
| | ||
| Phone: 55-19-2132-4317 E-mail: [email protected] | ||
----- Original message -----
From: "Olaf Weiser" <[email protected]>
Sent by: [email protected]
To: gpfsug main discussion list <[email protected]>
Cc:
Subject: Re: [gpfsug-discuss] Top files on GPFS filesystem
Date: Mon, Aug 13, 2018 3:10 AM
there's no mm* command to get it cluster wide..
you can use fileheat and policy engine to identify most active files .. and further more... combine it with migration rules ... to replace those files ..
please note.. files, that are accessed very heavily but all requests answered out of pagepol (cached files) .. fileheat does'nt get increased for cache hits... fileheat is only counted for real IOs to the disk... as intended ...
From: "Anderson Ferreira Nobre" <[email protected]>
To: [email protected]
Date: 08/10/2018 08:10 PM
Subject: [gpfsug-discuss] Top files on GPFS filesystem
Sent by: [email protected]
Hi all,
Does anyone know how to list the top files by throughput and IOPS in a single GPFS filesystem like filemon in AIX?
Abraços / Regards / Saludos,
Anderson Nobre
AIX & Power Consultant
Master Certified IT Specialist
IBM Systems Hardware Client Technical Team – IBM Systems Lab Services
Phone:55-19-2132-4317
E-mail: [email protected]
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
