On 8/12/20 6:44 PM, methonash wrote:
Hi,
Relative beginner to D-lang here, and I'm very confused by the apparent
performance disparity I've noticed between programs that do the following:
1) cat some-large-file | D-program-reading-stdin-byLine()
2) D-program-directly-reading-file-byLine() using File() struct
The D-lang difference I've noticed from options (1) and (2) is somewhere
in the range of 80% wall time taken (7.5s vs 4.1s), which seems pretty
extreme.
For comparison, I attempted the same using Perl with the same large
file, and I only noticed a 25% difference (10s vs 8s) in performance,
which I imagine to be partially attributable to the overhead incurred by
using a pipe and its buffer.
So, is this difference in D-lang performance typical? Is this expected
behavior?
Was wondering if this may have anything to do with the library
definition for std.stdio.stdin
(https://dlang.org/library/std/stdio/stdin.html)? Does global
file-locking significantly affect read-performance?
For reference: I'm trying to build a single-threaded application; my
present use-case cannot benefit from parallelism, because its ultimate
purpose is to serve as a single-threaded downstream filter from an
upstream application consuming (n-1) system threads.
Are we missing the obvious here? cat needs to read from disk, write the
results into a pipe buffer, then context-switch into your D program,
then the D program reads from the pipe buffer.
Whereas, reading from a file just needs to read from the file.
The difference does seem a bit extreme, so maybe there is another more
complex explanation.
But for sure, reading from stdin doesn't do anything different than
reading from a file if you are using the File struct.
A more appropriate test might be using the shell to feed the file into
the D program:
dprogram < FILE
Which means the same code runs for both tests.
-Steve