Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help?
I’m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that’s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < [email protected]>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > > *Current explanation:* > > tar seems to use close() on files, not fclose(). That is an > application choice and common behavior. The ideas is to allow OS write > caching to speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async > destaging of data down to disk, somewhat compromising data for better > performance. > As we're talking about write caching on the same node that the > application runs on - a crash is missfortune but in the same failure > domain. > E.g. if you run a compile job that includes extraction of a tar and > the node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably > because the compile job on the nfs client would survive if the NFS Server > crashes, so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and > introduced the 'async' flag for NFS, which would handle IO's similar to > local IOs, allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS > can always decided to use the fclose() option, which will ensure that data > is destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache > down to disk - very filesystem independent. > > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > > The Spectrum Scale NFS implementation (based on ganesha) does not > support the async mount option, which is a bit of a pitty. There might also > be implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on > NSD client) which seems to be more reasonable compared to NFS Server > side > write caching. > > > *References:* > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Grüßen / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: [email protected] 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Geschäftsführung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
