The Set contains more lines as it goes on – by the end of the first loop, it contains all the distinct lines in the first file, so you can save memory by giving the smaller file first. The lines will appear in the order they occur in the second file, however. I don't think it's possible to avoid holding all the lines in memory while doing this without making the algorithm process the data in multiple passes.
On Mon, Feb 22, 2016 at 3:16 PM, barbara.g <[email protected]> wrote: > I'm extremely grateful for your detailed answer. I'm not sure I have > understood the implementation of the algorithm: the Set you mention will > contain a single line during the entire process or will become heavier as > time goes by ? Bye ! >
