Thank you. I am sorry if i was not clear, but the metadata pool is all on SSDs in the GPFS clusters that we use. Its just the data pool that is on Near-Line Rotating disks. I understand that AFM might not be able to solve the issue, and I will try and see if file heat works for migrating the files to flash tier. You mentioned an all flash storage pool for heavily used files - so you mean a different GPFS cluster just with flash storage, and to manually copy the files to flash storage whenever needed? The IO performance that i am talking is prominently for reads, so you mention that LROC can work in the way i want it to? that is prefetch all the files into LROC cache, after only few headers/stubs of data are read from those files? I thought LROC only keeps that block of data that is prefetched from the disk, and will not prefetch the whole file if a stub of data is read. Please do let me know, if i understood it wrong.
On Feb 22, 2018, 4:08 PM -0500, IBM Spectrum Scale <sc...@us.ibm.com>, wrote: > I do not think AFM is intended to solve the problem you are trying to solve. > If I understand your scenario correctly you state that you are placing > metadata on NL-SAS storage. If that is true that would not be wise > especially if you are going to do many metadata operations. I suspect your > performance issues are partially due to the fact that metadata is being > stored on NL-SAS storage. You stated that you did not think the file heat > feature would do what you intended but have you tried to use it to see if it > could solve your problem? I would think having metadata on SSD/flash storage > combined with a all flash storage pool for your heavily used files would > perform well. If you expect IO usage will be such that there will be far > more reads than writes then LROC should be beneficial to your overall > performance. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > > > From: vall...@cbio.mskcc.org > To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> > Date: 02/22/2018 03:11 PM > Subject: [gpfsug-discuss] GPFS and Flash/SSD Storage tiered storage > Sent by: gpfsug-discuss-boun...@spectrumscale.org > > > > Hi All, > > I am trying to figure out a GPFS tiering architecture with flash storage in > front end and near line storage as backend, for Supercomputing > > The Backend storage will be a GPFS storage on near line of about 8-10PB. The > backend storage will/can be tuned to give out large streaming bandwidth and > enough metadata disks to make the stat of all these files fast enough. > > I was thinking if it would be possible to use a GPFS flash cluster or GPFS > SSD cluster in front end that uses AFM and acts as a cache cluster with the > backend GPFS cluster. > > At the end of this .. the workflow that i am targeting is where: > > > “ > If the compute nodes read headers of thousands of large files ranging from > 100MB to 1GB, the AFM cluster should be able to bring up enough threads to > bring up all of the files from the backend to the faster SSD/Flash GPFS > cluster. > The working set might be about 100T, at a time which i want to be on a > faster/low latency tier, and the rest of the files to be in slower tier until > they are read by the compute nodes. > “ > > > I do not want to use GPFS policies to achieve the above, is because i am not > sure - if policies could be written in a way, that files are moved from the > slower tier to faster tier depending on how the jobs interact with the files. > I know that the policies could be written depending on the heat, and > size/format but i don’t think thes policies work in a similar way as above. > > I did try the above architecture, where an SSD GPFS cluster acts as an AFM > cache cluster before the near line storage. However the AFM cluster was > really really slow, It took it about few hours to copy the files from near > line storage to AFM cache cluster. > I am not sure if AFM is not designed to work this way, or if AFM is not tuned > to work as fast as it should. > > I have tried LROC too, but it does not behave the same way as i guess AFM > works. > > Has anyone tried or know if GPFS supports an architecture - where the fast > tier can bring up thousands of threads and copy the files almost > instantly/asynchronously from the slow tier, whenever the jobs from compute > nodes reads few blocks from these files? > I understand that with respect to hardware - the AFM cluster should be really > fast, as well as the network between the AFM cluster and the backend cluster. > > Please do also let me know, if the above workflow can be done using GPFS > policies and be as fast as it is needed to be. > > Regards, > Lohit > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=kMYZhGPhwadAbNHucw79NJgyYAJAMgxyFZKEW-kMeqk&s=AT1gb89TzzE7nt58h8DYyhYkybvBY8mbXvdPjtaRRpU&e= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss