My thinking was mainly that single threaded 200 files/second == 5 ms/file. Where do these 5 ms go? Is it NFS protocol overhead, or is it waiting for I/O so that it can be fixed with a lower latency storage backend?
-jf On Wed, Oct 17, 2018 at 9:15 AM Olaf Weiser <[email protected]> wrote: > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust <[email protected]> > To: gpfsug main discussion list <[email protected]> > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: [email protected] > ------------------------------ > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have you > considered if HAWC can help? > > I’m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that’s mainly to solve the small file read > workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > *[email protected]* <[email protected]>>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > *Current explanation:* > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to speed > up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side write > caching. > > *References:* > > NFS sync vs async > *https://tools.ietf.org/html/rfc1813* > <https://tools.ietf.org/html/rfc1813> > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > *https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm* > <https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm> > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Grüßen / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: *[email protected]* <[email protected]> 65451 > Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Geschäftsführung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org> > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>*[attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
