On Thursday, 2 November 2023 at 15:46:23 UTC, confuzzled wrote:
I've ported a small script from C to D. The original C version takes roughly 6.5 minutes to parse a 12G file while the port originally took about 48 minutes. My naïve attempt to improve the situation pushed it over an hour and 15 minutes. However, replacing std.stdio:File with core.stdc.stdio:FILE* and changing my output code in this latest version from:

        outputFile.writefln("%c\t%u\t%u\t%d.%09u\t%c", ...)

to
        fprintf(outputFile, "%c,%u,%u,%llu.%09llu,%c\n", ...)

reduced the processing time to roughly 7.5 minutes. Why is File.writefln() so appallingly slow? Is there a better D alternative?

First, strace your program. The slowest thing about I/O is the syscall itself. If the D program does more syscalls, it's going to be slower almost no matter what else is going on. Both D and C are using libc to buffer I/O to reduce syscalls, but you might be defeating that by constantly flushing the buffer.


I tried std.io but write() only outputs ubyte[] while I'm trying to output text so I abandoned idea early.

string -> immutable(ubyte)[]: alias with std.string.representation(st)

'alias' meaning, this doesn't allocate. If gives you a byte slice of the same memory the string is using.

You'd still need to do the formatting, before writing.

Now that I've got the program execution time within an acceptable range, I tried replacing core.stdc.fread() with std.io.read() but that increased the time to 24 minutes. Now I'm starting to think there is something seriously wrong with my understanding of how to use D correctly because there's no way D's input/output capabilities can suck so bad in comparison to C's.


Reply via email to