On Mon, Mar 13, 2017 at 04:50:49PM +0000, XavierAP via Digitalmars-d-learn wrote: > It's not easy to do by hand of course, but I was wondering if there > was one simple function taking two file names and just returning a > bool or something like that. I haven't found it in std.file.
Why it is not easy to do by hand? All you have to do is open the two files, then iterate over their data and compare. Of course, you'd want to chunk them up to minimize I/O roundtrips, but it'd be something along the lines of: bool isEqual(string filename1, string filename2) { import std.algorithm.comparison : equal; import std.range : zip; import std.stdio : File, chunks; auto f1 = File(filename1); auto f2 = File(filename2); size_t blockSize = 4096; // or something similar return f1.chunks(blockSize).equal(f2.chunks(blockSize)); } > If such a function doesn't exist in Phobos but there's a good > implementation in some other library, I'm interested to know. Although > this time it's for a unit test so I'd rather implement it in two lines > than add a dependency. > > And otherwise to write it by hand, how do you think is the best way? > And in terms of performance? By chunks in case of a binary comparison? > And what about the case of a text comparison? [...] Binary comparison is easy. Just read the files by fixed-sized chunks and compare them. Text comparison is a can of worms. What kind of comparison do you have in mind? Case-sensitive or insensitive? Is it ASCII or Unicode? What kind of Unicode encoding is involved? Are the files expected to be in one of the normalization forms, or is it free-for-all? Do you expect grapheme equivalence or just character equivalence? Depending on the answer to these questions, text comparison can range from trivial to hair-raisingly complex. But the fundamental ideas remain the same: read a stream of characters / graphemes from each file in tandem, preferably buffered, and compare them using some given comparison function. P.S. I just realized that std.stdio.chunks() doesn't return a range. Bah. File an enhancement request. I might even submit a PR for it. ;-) T -- There are four kinds of lies: lies, damn lies, and statistics.