Quite true.  In my case the resync points (after ignoring or stripping header 
lines) can be as little as a blank line followed by a single matching record in 
the areas of the file where the differences occur, so maybe SuperC isn't seeing 
enough lines of resync to work correctly.

OTOH the GNUWIN32 diff utility on my laptop found all of the differences 
without a problem, so it can be done, just not apparently by SuperC.

Peter

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Joel C. Ewing
Sent: Tuesday, January 08, 2013 2:23 PM
To: [email protected]
Subject: Re: SuperC utility loses its place when comparing text files with many 
differences

<Snipped>

This is a inherent problem with any algorithm which attempts to 
flag/detect changes between files and do it with minimum resource 
consumption.  Your typical algorithm compares two files and when a 
difference is detected starts looking through subsequent records in both 
files for a "resync" point where both files match up again, with trivial 
false resync points typically eliminated by requiring a resync point to 
have "n" consecutive records match.  There problem is that for any "n", 
one can create cases where "false" matches would be found, or if the 
changes are too massive, no resync point may be found.  We humans would 
like the algorithm to find the "best" resync point, which would probably 
be defined as one which minimizes the total changes reported for the two 
files, but it is unclear how one could compute this without a recursive 
algorithm that exhaustively tries all possible resync points and doesn't 
just accept the first one found.  Humans looking at two files that have 
frequently recurring patterns interspersed with unique records can 
intuitively tune out the repetitive patterns and look at the unique 
records when looking for a best resync points, but building those smarts 
into a formal algorithm may not be feasible.

-- 
Joel C. Ewing,    Bentonville, AR       [email protected] 
--

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to