Thanks Jag! I will certainly look into Levenshtein! I found this tool here (https://www.safe-corp.com/products_codematch.htm) but it costs up to $400/MB ( https://www.safe-corp.com/documents/CodeSuite%20pricing.pdf) and seemed like something Meld would be perfect for with minimal effort, and it seemed like Meld could attract a whole new group of power users, and maybe even some with some funding behind them to improve Meld.
I have a .NET programmer part time that is coming by this afternoon that I may have look at extracting those stats - but not sure how realistic it is as an afternoon project for someone not familiar with the code base. Alan On Thu, Sep 28, 2017 at 9:33 AM, Jaggz H <[email protected]> wrote: > Halls, > > 1. You might do yourself some good coding of your own, if you can -- > possibly using a combination of shell/coding. I'd recommend you doing this, > assuming you're the one in the right :), because you'll be able to get the > custom stats needed for strength in your case, without being limited to > someone else's tools. > 2. That being said, maybe a few stats would be useful to some people in > meld. I wonder if kdiff3 outputs stats. kdiff3 is another GUI diff-merge > tool. I use meld and kdiff3. > 3. Also, maybe look into the Levenshtein text difference algorithm. In > Perl I use > Text::Levenshtein (_XS). It provides a character-distance between two > texts (ie. how many single-character edits are needed to make one into the > other), which then readily translates to a percentage. In that respect, > it's more literally-related to the amount of change than line counts. > > Jag > > On Sep 28, 2017 7:09 AM, "Alan Halls" <[email protected]> wrote: > >> Thanks Phil for the response, I guess I was thinking of a debug report >> such as: >> Files Analyzed:19,543 >> Folders Analyzed:343 >> Total lines of code analyzed: 1,544,346 >> Total lines of code in source: 1,244,346 >> Total lines of code in destination: 1,944,346 >> Total lines with exact matches: 856,644 >> Unique lines in source: 400,546 >> Unique lines in destination: 850,546 >> Similarity of source to destination: 45% >> Exact matches of greater than 25 contiguous lines of code: 943 >> Exact matches of greater than 5 contiguous lines of code: 46,733 >> >> I looked into the plagiarism-detector tools and haven't found anything >> yet that does PHP, and the command line diff tools "should" be able to >> output this type of report, I just figured that all of this info, with the >> exception of the last 2 would be already tracked in the software and just >> need to be output somewhere. >> >> Alan >> >> On Wed, Sep 27, 2017 at 4:14 PM, Phil Hord <[email protected]> wrote: >> >>> Alan, >>> >>> Tools already exist that more directly meet your need. Any unix-like >>> system will have command-line tools to do most of this analysis. I'd start >>> with "diff -b -B -w", but you can also use "comm". The comm tool relies on >>> the files being sorted, though, so you might want to ignore "empty" lines >>> or common lines like </head>, for example. >>> >>> There are some plagiarism-detector tools that may also help, but I don't >>> have any experience with those. >>> >>> Feel free to contact me off-list if you need more specific guidance. >>> Phil >>> >>> >>> On Wed, Sep 27, 2017 at 2:49 PM Alan Halls <[email protected]> wrote: >>> >>>> I am involved in a legal matter regarding an employees theft of trade >>>> secrets. In particular he stole the source code for a website that he and 2 >>>> other programmers worked on for 2 years. >>>> >>>> I now have a copy of his project, and of course a copy of mine. I found >>>> the software Meld which seems to do a great job on a one by one basis, but >>>> it would be very time consuming to try to end up with any "score" of how >>>> much of our original code is still in his existing project. >>>> >>>> He was sloppy and his launched public website still has our company >>>> info in the 404 page, which links you to the about us, pricing, docs, >>>> contact us pages ---- which all still have the original code in them, so >>>> there is no question about whether or not he did, just how much "custom" >>>> work did he do for himself. >>>> >>>> I was kind of imagining a report with a total score, then the top 50 >>>> matches with each of their scores. Has anyone thought of adding that in? It >>>> seems that all that info would be available already in the program, just >>>> needing a view for it to display on. >>>> >>>> _______________________________________________ >>>> meld-list mailing list >>>> [email protected] >>>> https://mail.gnome.org/mailman/listinfo/meld-list >>> >>> >> >> _______________________________________________ >> meld-list mailing list >> [email protected] >> https://mail.gnome.org/mailman/listinfo/meld-list >> >
_______________________________________________ meld-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/meld-list
