On Wednesday, 17 February 2021 at 04:10:24 UTC, tsbockman wrote:
On files small enough to fit in RAM, it is similar in speed to the other solutions posted, but less memory hungry. Memory consumption in this case is around (sourceFile.length + 32 * lineCount * 3 / 2) bytes. Run time is similar to other posted solutions: about 3 seconds per GiB on my desktop.
Oops, I think the memory consumption should be (sourceFile.length + 32 * (lineCount + largestBucket.lineCount / 2)) bytes. (In the limit where everything ends up in one bucket, it's the same, but that shouldn't normally happen unless the entire file has only one unique line in it.)