On 3/22/15 12:03 AM, Andrei Alexandrescu wrote:
I just took a look at making byLine faster. It took less than one evening:
https://github.com/D-Programming-Language/phobos/pull/3089
I confess I am a bit disappointed with the leadership being unable to
delegate this task to a trusty lieutenant in the community. There's been
a bug opened on this for a long time, it gets regularly discussed here
(with the wrong conclusions ("we must redo D's I/O because FILE* is
killing it!") about performance bottlenecks drawn from unverified
assumptions), and the techniques used to get a marked improvement in the
diff above are trivial fare for any software engineer. The following
factors each had a significant impact on speed:
* On OSX (which I happened to test with) getdelim() exists but wasn't
being used. I made the implementation use it.
* There was one call to fwide() per line read. I used simple caching (a
stream's width cannot be changed once set, making it a perfect candidate
for caching).
(As an aside there was some unreachable code in ByLineImpl.empty, which
didn't impact performance but was overdue for removal.)
* For each line read there was a call to malloc() and one to free(). I
set things up that the buffer used for reading is reused by simply
making the buffer static.
* assumeSafeAppend() was unnecessarily used once per line read. Its
removal led to a whopping 35% on top of everything else. I'm not sure
what it does, but boy it does takes its sweet time. Maybe someone should
look into it.
Destroy.
Andrei
* Avoid most calls to GC.sizeOf.
Andrei