I have a semi-big mess here. Never mind how I managed it but let me describe what I want to do and maybe some kind soul can steer me to something on cpan or just have some helpful clues.
I have two directories of news/mail messages containing different number of files. However I expect there is file overlap in both directions. That is, some files are the same message in either directory. It gets even more complicated in that there may actually be different headers present but the body message will be identical. By different headers I mean 1 or 2, like a differing Xref header. Or possible the Subject, From: or Newsgroup: headers. That may even be true inside a single directory. That is, there may be duplicate bodies in the same directory too. All I can think of for coding is to ignore the headers completely and compare only the bodies. But even doing that sounds like a fairly complicated undertaking. Seems like every single message (body) would need to be compared first to every other in its own directory and then to all in the other directory. At least I expect the bodies would be identical, not just close. I can probably manage the coding.. it would be sloppy and primitive but I can probably do it with maybe a little help. What I'm asking here is not the coding so much, although that is welcome too, but really a general plan of how to go at this. My first thought was to hold each body in turn in an array, then holding that, make arrays one by one of the other bodies and compare each as I go along. Or maybe something with hashes, since they have that handy property of keys canceling if they are the same. Then again it might be quicker to slurp a body as a string and use a uniqifier like: if ($data{$_}++ == 0) { (do something with $_); } But that might have problems with differing number of blank lines... Although I do expect the bodies to be identical, I'm not really 100% sure about that either. May have to include some leveling code to chomp the lines and remove any blanks so the number of blank lines isn't considered ... Any guidance gratefully accepted.. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/