On 18 June 2013 10:20, Glenn Fowler <[email protected]> wrote: > > you showed ast grep strace output but not gnu grep
sorry, I didn't verify the data send by my staff > gnu /usr/bin/grep does read of varying chunk sizes on my redhat linux > for NFS files the chunk sizes were around 32Ki chunk size for read() or chunk size for mmap()? > > I added some test options to the SFIO_OPTIONS env var > > SFIO_OPTIONS=nomaxmap # disable mmap(), force read() > SFIO_OPTIONS=maxmap=-1 # map entire file > SFIO_OPTIONS=maxmap=1Gi # map 1Gi chunks etc. > > as long ast the buffer isn't trivially small I don't think lines > spanning buffer boundaries will be a drag on timing You don't understand the issue related to *sharing* largepage I/O pages among processes. Scenario: We have one very large 20GB+ input file. The machine in question has 64GB memory. We run around 400 or more jobs to filter and analyse the input file, mostly in parallel. If the mmap() buffers are to small then the kernel will not share them among different processes and keeps copying,reading (through consumer of mmap()),purging the buffers. If the mmap() chunk size is large enough (e.g. > 128MB in this particular case; it appears to be a kernel threshold of 64 pages with 2M each to trigger the kernel to grant 2M pages for mmap() I/O usage) the kernel will enable the use of largepages (2M on AMD64, compared to the default I/O page size of 4k) and share (without - and this is the crucial point - creating private copies) the pages between processes which concurrently work on the file. > > you might be able to set up a few experiments to see if there is > a knee in the performance curves where the space-time tradeoffs meet I'll ask my staff Lionel _______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
