Re: task parallelize dirEntries

Johnson via Digitalmars-d-learn Fri, 11 Aug 2017 15:01:15 -0700

On Friday, 11 August 2017 at 21:33:51 UTC, Arun Chandrasekaranwrote:

I've modified the sample from tour.dlang.org to calculate themd5 digest of the files in a directory using std.parallelism.
When I run this on a dir with huge number of files, I get:
core.exception.OutOfMemoryError@src/core/exception.d(696):Memory allocation failed
Since dirEntries returns a range, I thoughtstd.parallelism.parallel can make use of that without loadingthe entire file list into the memory.
What am I doing wrong here? Is there a way to achieve what I'mexpecting?
```
import std.digest.md;
import std.stdio: writeln;
import std.file;
import std.algorithm;
import std.parallelism;

void printUsage()
{
writeln("Loops through a given directory and calculates themd5 digest of each file encountered.");
    writeln("Usage: md <dirname>");
}

void safePrint(T...)(T args)
{
    synchronized
    {
        import std.stdio : writeln;
        writeln(args);
    }
}

void main(string[] args)
{
    if (args.length != 2)
        return printUsage;
foreach (d; parallel(dirEntries(args[1],SpanMode.depth).filter!(f => f.isFile), 1))
    {
        auto md5 = new MD5Digest();
        md5.reset();
        auto data = cast(const(ubyte)[]) read(d.name);
        md5.put(data);
        auto hash = md5.finish();
        import std.array;
        string[] t = split(d.name, '/');
safePrint(toHexString!(LetterCase.lower)(hash), " ",t[$-1]);
    }
}
```

Just a thought, maybe the GC isn't cleaning up quick enough? Youare allocating and md5 digest each iteration.

Possibly, an opitimization is use use a collection of md5 hashesand reuse them. e.g., pre-allocate 100(you probably only need asmany as the number of parallel loops going) and then attempt toresuse them. If all are in use, wait for a free one. Mightrequire some synchronization.

Re: task parallelize dirEntries

Reply via email to