On Jun 28, 2022, at 8:51 AM, David Erlandson <david.erland...@rice.edu> wrote:
> I have a colleague who is looking to track changes in text of a manuscript > that has 4 revisions. Apparently there are pretty major changes to the > content and it would be great to identify them. > > I was thinking through tools I'm familiar with (generally line by line > comparisons) but that would seem to have the pitfall of an early large > revision throwing off the comparison for the rest of the text. Another silly > thought was to start up a local wiki instance and overlay each version; use > the built in compare tools... Has anyone worked on a project like this? Or > are there any tools built and ready to go? Any guidance would be appreciated. If I understand the question correctly, then I believe you need to do what is sometimes called "collocation", and I used a JavaScript library to accomplish a similar task. The library is called TRAViz [1]. More specifically, I had two sets of files, and each set was a translation the Psalms. One translated in 1610 and the other translated in 1700. [2] I wanted to see how each translation was similar and different. Each file in each set was similarly named. I then wrote a Python script that loops through the translations and outputs an HTML file. [3] The HTML file is highly structured, calls TRAViz, and outputs a visualization illustrating where two translations differed and converged. You can temporarily see the results of these labors online, but be forewarned because TRAViz is doing a lot of work against many paragraphs. Rendering is slow. [4] HTH [1] TRAViz - http://www.traviz.vizcovery.org [2] Psalms - http://dh.crc.nd.edu/tmp/collocations/psalms/ [3] Python script - http://dh.crc.nd.edu/tmp/collocations/bin/psalms2html.py [4] results - http://dh.crc.nd.edu/tmp/collocations/html/ -- Eric Lease Morgan Navari Family Center for Digital Scholarship Hesburgh Libraries University of Notre Dame 574/631-8604 https://cds.library.nd.edu