Alan,

Tools already exist that more directly meet your need.  Any unix-like
system will have command-line tools to do most of this analysis.  I'd start
with "diff -b -B -w", but you can also use "comm".  The comm tool relies on
the files being sorted, though, so you might want to ignore "empty" lines
or common lines like </head>, for example.

There are some plagiarism-detector tools that may also help, but I don't
have any experience with those.

Feel free to contact me off-list if you need more specific guidance.
Phil


On Wed, Sep 27, 2017 at 2:49 PM Alan Halls <[email protected]> wrote:

> I am involved in a legal matter regarding an employees theft of trade
> secrets. In particular he stole the source code for a website that he and 2
> other programmers worked on for 2 years.
>
> I now have a copy of his project, and of course a copy of mine. I found
> the software Meld which seems to do a great job on a one by one basis, but
> it would be very time consuming to try to end up with any "score" of how
> much of our original code is still in his existing project.
>
> He was sloppy and his launched public website still has our company info
> in the 404 page, which links you to the about us, pricing, docs, contact us
> pages ---- which all still have the original code in them, so there is no
> question about whether or not he did, just how much "custom" work did he do
> for himself.
>
> I was kind of imagining a report with a total score, then the top 50
> matches with each of their scores. Has anyone thought of adding that in? It
> seems that all that info would be available already in the program, just
> needing a view for it to display on.
>
> _______________________________________________
> meld-list mailing list
> [email protected]
> https://mail.gnome.org/mailman/listinfo/meld-list
_______________________________________________
meld-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/meld-list

Reply via email to