On 18 June 2013 10:20, Glenn Fowler <[email protected]> wrote: > > you showed ast grep strace output but not gnu grep > gnu /usr/bin/grep does read of varying chunk sizes on my redhat linux > for NFS files the chunk sizes were around 32Ki > > I added some test options to the SFIO_OPTIONS env var > > SFIO_OPTIONS=nomaxmap # disable mmap(), force read() > SFIO_OPTIONS=maxmap=-1 # map entire file > SFIO_OPTIONS=maxmap=1Gi # map 1Gi chunks etc. > > as long ast the buffer isn't trivially small I don't think lines > spanning buffer boundaries will be a drag on timing > > you might be able to set up a few experiments to see if there is > a knee in the performance curves where the space-time tradeoffs meet
The test results are in the email below: - bigger mmap() windows for sfio file IO. The break even we calculated is somewhere between 108-140MB window size on an Intel Sandy Bridge four way server and 144-157 for an AMD Jaguar prototype machine (X1150). This is likely to 'fix' most of the grep performance issue. Also forwarding a clarifying comment: " explain them that this is NOT akin to buffer bloat. It only the defines the maximum size of the address space window a process can have for this file. The kernel itself is in charge to select an appropriate number of pages it can donate to pass data through this window" Lionel ---------- Forwarded message ---------- From: Lionel Cons <[email protected]> Date: 2 September 2013 21:41 Subject: Re: [ast-developers] ksh93v- beta, and must have bug fixes To: ольга крыжановская <[email protected]>, Glenn Fowler <[email protected]>, David Korn <[email protected]> Cc: "[email protected]" <[email protected]> On 30 August 2013 21:05, ольга крыжановская <[email protected]> wrote: > IMO before you move ksh93v- to "beta" status, we should collect bugs > and changes which should be done and finalized before we call it beta. > There are very, very bad issues in ast-ksh.20130829, and they should > be addressed. I am going - due lack of Bugzilla, a list on our svn. > > Submissions for the worst of worst bugs please to the list now. Is there still time to submit issues? * Our top entries - based on engineering survey of 20130902 would be: - sfpoll2() - stable and reliable interface to SIGRTMIN-SIGRTMAX signals, including .sh.sig - sync(1) with support for fsync() and syncfs() * Medium priorities: - singlebyte locales like en_GB.iso885915 must work - support for \u and \w, as done with Roland Mainz's patch, as this has proved useful for my staff during today's preliminary testing. The patch would allow them to remove a lot of weired ( LC_ALL=en_US.utf8; printf '...' | iconv -f UTF-8; ) lines with a plain print -f '\u[hexvalue]'. This implies that \u and \w work on singlebyte locales too - grep builtin performance improvements for large files. As of ast-ksh.2013-08-14 AST grep only reaches 85% of the performance of GNU grep - bigger mmap() windows for sfio file IO. The break even we calculated is somewhere between 108-140MB window size on an Intel Sandy Bridge four way server and 144-157 for an AMD Jaguar prototype machine (X1150). This is likely to 'fix' most of the grep performance issue. Also forwarding a clarifying comment: " explain them that this is NOT akin to buffer bloat. It only the defines the maximum size of the address space window a process can have for this file. The kernel itself is in charge to select an appropriate number of pages it can donate to pass data through this window" - /dev/file/@direct and /dev/file/@directory to open files with O_DIRECT (no kernel buffering, but maybe zero copy data passing) or O_DIRECTORY (self explanatory). I leave the exact spelling of /dev/@file@options@@@/@path@ to Glenn's discretion :) - no //@//. Self explanatory. A lot of people here went to great lengths to clean their scripts and filter extra slashes from path names out of the shear fear that //@// may blow up their scripts. I'd always welcome cleanup, but this fear-induced craziness was real waste of engineering time * Low priority: - namespace feature - public, web-based issue tracker - more space-efficient associative arrays in ksh93. As of ast-ksh.2013-08-14 they consume a lot more memory than indexed arrays - return of the fixed-sized integer arrays, i.e. that integer -a array[5000000] preallocate memory for an indexed array with 5000000 slots. This may improve memory usage by allocating the array once instead of growing it entry by entry and thrashing the heap on the way - the ksh93 test suite should pass without warnings or crashes on the major Linux distributions (Ubuntu/Debian, Suse, Redhat/Fedora) - cd -@ to enter and leave attribute directories on platforms which support open(2) O_XATTR - open directory file descriptor to a attribute directory on platforms which support open(2) O_XATTR - mathematical constants like M_PI and MIN/MAX values for all integer types - test ksh93 on ARM64 Lionel _______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
