On 18 June 2013 10:20, Glenn Fowler <[email protected]> wrote:
>
> you showed ast grep strace output but not gnu grep

sorry, I didn't verify the data send by my staff

> gnu /usr/bin/grep does read of varying chunk sizes on my redhat linux
> for NFS files the chunk sizes were around 32Ki

chunk size for read() or chunk size for mmap()?

>
> I added some test options to the SFIO_OPTIONS env var
>
>         SFIO_OPTIONS=nomaxmap   # disable mmap(), force read()
>         SFIO_OPTIONS=maxmap=-1  # map entire file
>         SFIO_OPTIONS=maxmap=1Gi # map 1Gi chunks etc.
>
> as long ast the buffer isn't trivially small I don't think lines
> spanning buffer boundaries will be a drag on timing

You don't understand the issue related to *sharing* largepage I/O
pages among processes.
Scenario:
We have one very large 20GB+ input file. The machine in question has
64GB memory. We run around 400 or more jobs to filter and analyse the
input file, mostly in parallel. If the mmap() buffers are to small
then the kernel will not share them among different processes and
keeps copying,reading (through consumer of mmap()),purging the
buffers.
If the mmap() chunk size is large enough (e.g. > 128MB in this
particular case; it appears to be a kernel threshold of 64 pages with
2M each to trigger the kernel to grant 2M pages for mmap() I/O usage)
the kernel will enable the use of largepages (2M on AMD64, compared to
the default I/O page size of 4k) and share (without - and this is the
crucial point - creating private copies) the pages between processes
which concurrently work on the file.

>
> you might be able to set up a few experiments to see if there is
> a knee in the performance curves where the space-time tradeoffs meet

I'll ask my staff

Lionel
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to