Re: Trying to reduce memory usage

frame via Digitalmars-d-learn Fri, 12 Feb 2021 00:00:32 -0800

On Friday, 12 February 2021 at 07:23:12 UTC, frame wrote:

On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
This turns the OP's O(n log n) algorithm into an O(n)algorithm, doesn'tneed to copy the entire content of the file into memory, andalso uses
much less memory by storing only hashes.
But this kind of hash is maybe insufficient to avoid hashcollisions. For such big data slower but stronger algorithmslike SHA are advisable.
Also associative arrays uses the same weak algorithm where youcan run into collision issues. Thus using the hash from stringdata as key can be a problem. I always use a quick hash as keybut hold actually a collection of hashes in them and do alookup to be on the safe side.

Forgot to mention that this kind of solution needs a betterapproach if you don't want to miss a potential different line:

You can use a weak hash but track the line position and count howoften the same hash occurs as a pre-process. In the post-processyou look for this lines again and compare if they are reallyidentical or hash collisions to correct.

Re: Trying to reduce memory usage

Reply via email to