Simon, >Question 1. >Can we force the gateway node for the other file-sets to our "02" node. >I.e. So that we can get the queue services for the other filesets.
AFM automatically maps the fileset to gateway node, and today there is no option available for users to assign fileset to a particular gateway node. This feature will be supported in future releases. >Question 2. >How can we make AFM actually work for the "facility" file-set. If we shut >down GPFS on the node, on the secondary node, we'll see log entires like: >2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove >operations... >So I'm assuming the massive queue is all file remove operations? These are the files which were created in cache, and were deleted before they get replicated to home. AFM recovery will delete them locally. Yes, it is possible that most of these operations are local remove operations.Try finding those operations using dump command. mmfsadm saferdump afm all | grep 'Remove\|Rmdir' | grep local | wc -l >Alarmingly, we are also seeing entires like: >2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache >fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name remote >error 5 Traces are needed to verify IO errors. Also try disabling the parallel IO and see if replication speed improves. mmchfileset device fileset -p afmParallelWriteThreshold=disable ~Venkat ([email protected]) From: "Simon Thompson (IT Research Support)" <[email protected]> To: "[email protected]" <[email protected]> Date: 10/09/2017 06:27 PM Subject: [gpfsug-discuss] AFM fun (more!) Sent by: [email protected] Hi All, We're having fun (ok not fun ...) with AFM. We have a file-set where the queue length isn't shortening, watching it over 5 sec periods, the queue length increases by ~600-1000 items, and the numExec goes up by about 15k. The queues are steadily rising and we've seen them over 1000000 ... This is on one particular fileset e.g.: mmafmctl rds-cache getstate Mon Oct 9 08:43:58 2017 Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec ------------ -------------- ------------- ------------ ------------ ------------- rds-projects-facility gpfs:///rds/projects/facility Dirty bber-afmgw01 3068953 520504 rds-projects-2015 gpfs:///rds/projects/2015 Active bber-afmgw01 0 3 rds-projects-2016 gpfs:///rds/projects/2016 Dirty bber-afmgw01 1482 70 rds-projects-2017 gpfs:///rds/projects/2017 Dirty bber-afmgw01 713 9104 bear-apps gpfs:///rds/bear-apps Dirty bber-afmgw02 3 2472770871 user-homes gpfs:///rds/homes Active bber-afmgw02 0 19 bear-sysapps gpfs:///rds/bear-sysapps Active bber-afmgw02 0 4 This is having the effect that other filesets on the same "Gateway" are not getting their queues processed. Question 1. Can we force the gateway node for the other file-sets to our "02" node. I.e. So that we can get the queue services for the other filesets. Question 2. How can we make AFM actually work for the "facility" file-set. If we shut down GPFS on the node, on the secondary node, we'll see log entires like: 2017-10-09_13:35:30.330+0100: [I] AFM: Found 1069575 local remove operations... So I'm assuming the massive queue is all file remove operations? Alarmingly, we are also seeing entires like: 2017-10-09_13:54:26.591+0100: [E] AFM: WriteSplit file system rds-cache fileset rds-projects-2017 file IDs [5389550.5389550.-1.-1,R] name remote error 5 Anyone any suggestions? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=_THXlsTtzTaQQnCD5iwucKoQnoVZmXwtZksU6YDO5O8&s=LlIrCk36ptPJs1Oix2ekZdUAMcH7ZE7GRlKzRK1_NPI&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
