On Saturday, 3 November 2018 at 14:26:02 UTC, dwdv wrote:
Hi there,

the task is simple: count word occurrences from stdin (around 150mb in this case) and print sorted results to stdout in a somewhat idiomatic fashion.

Now, d is quite elegant while maintaining high performance compared to both c and c++, but I, as a complete beginner, can't identify where the 10x memory usage (~300mb, see results below) is coming from.

Unicode overhead? Internal buffer? Is something slurping the whole file? Assoc array allocations? Couldn't find huge allocs with dmd -vgc and -profile=gc either. What did I do wrong?

Not exactly the same problem, but there is relevant discussion in the blog post I wrote a while ago: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

See in particular the section on Associate Array lookup optimization. This takes advantage of the fact that it's only necessary to create the immutable string the first time a key is entered into the hash. Subsequent occurrences do not need to take this step. As creating allocates new memory, even if only used temporarily, this is a meaningful savings.

There have been additional APIs added to the AA interface since I wrote the blog post, I believe it is now possible to accomplish the same thing with more succinct code.

Other optimization possibilities:
* Avoid auto-decode: Not sure if your code is hitting this, but if so it's a significant performance hit. Unfortunately, it's not always obvious when this is happening. The task your are performing doesn't need auto-decode because it is splitting on single-byte utf-8 char boundaries (newline and space).

* LTO on druntime/phobos: This is easy and will have a material speedup. Simply add
        '-defaultlib=phobos2-ldc-lto,druntime-ldc-lto'
to the 'ldc2' build line, after the '-flto=full' entry. This will be a win because it will enable a number of optimizations in the internal loop.

* Reading the whole file vs line by line - 'byLine' is really fast. It's also nice and general, as it allows reading arbitrary size files or standard input without changes to the code. However, it's not as fast as reading the file in a single shot.

* std.algorithm.joiner - Has improved dramatically, but is still slower than a foreach loop. See: https://github.com/dlang/phobos/pull/6492

--Jon


Reply via email to