I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. By the end of this code snippet, the memory usage is ~5x the size of the infile (which can be multiple GB each), and when this is in a loop the memory usage becomes unmanageable and often results in an OutOfMemory error or just a complete lock up of the system. Is there a way to reduce the memory usage of this code without sacrificing speed to any noticeable extent? My assumption is the .sort.uniq needs improving, but I can't think of an easier/not much slower way of doing it.

Windows 10 x64
LDC - the LLVM D compiler (1.21.0-beta1):
  based on DMD v2.091.0 and LLVM 10.0.0

-----------------------------------

auto filename = "path\\to\\file.txt.temp";
auto array = appender!(string[]);
File infile = File(filename, "r");
foreach (line; infile.byLine) {
  array ~= line.to!string;
}
File outfile = File(stripExtension(filename), "w");
foreach (element; (array[]).sort.uniq) {
outfile.myrawWrite(element ~ "\n"); // used to not print the \r on windows
}
outfile.close;
array.clear;
array.shrinkTo(0);
infile.close;

-----------------------------------

Thanks.

Reply via email to