On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:
I've ported a small script from C to D. The original C version
takes roughly 6.5 minutes to parse a 12G file while the port
originally took about 48 minutes. My naïve attempt to improve
the situation pushed it over an hour and 15 minutes. However,
replacing std.stdio:File with core.stdc.stdio:FILE* and
changing my output code in this latest version from:
outputFile.writefln("%c\t%u\t%u\t%d.%09u\t%c", ...)
to
fprintf(outputFile, "%c,%u,%u,%llu.%09llu,%c\n", ...)
reduced the processing time to roughly 7.5 minutes. Why is
File.writefln() so appallingly slow? Is there a better D
alternative?
First, strace your program. The slowest thing about I/O is the
syscall itself. If the D program does more syscalls, it's going
to be slower almost no matter what else is going on. Both D and C
are using libc to buffer I/O to reduce syscalls, but you might be
defeating that by constantly flushing the buffer.
I tried std.io but write() only outputs ubyte[] while I'm
trying to output text so I abandoned idea early.
string -> immutable(ubyte)[]: alias with
std.string.representation(st)
'alias' meaning, this doesn't allocate. If gives you a byte slice
of the same memory the string is using.
You'd still need to do the formatting, before writing.
Now that I've got the program execution time within an
acceptable range, I tried replacing core.stdc.fread() with
std.io.read() but that increased the time to 24 minutes. Now
I'm starting to think there is something seriously wrong with
my understanding of how to use D correctly because there's no
way D's input/output capabilities can suck so bad in comparison
to C's.