Hi all
While copyediting a text for a scholarly book (500+ pages when printed), I
noticed that the author wrote exactly the same long sentence (= an
identical string of 337 characters) once on page 23 and once on page 326.
No doubt this happened because the author copied and pasted some text from
his notes, unaware that he had already copied and pasted the same text
earlier. I thought it would be a good idea to find out whether this has
happened to the author more than one time in his 1,000,000-character book,
so that I can alert him (to give him a chance to omit the repetition).
And so I turned to BBEdit. The text of the whole book is now in a txt file.
When I search for the sentence that in the Word document is on page 23, I
can find it in BBEdit both in paragraph 117 and in paragraph 7831. What
regular expression can I use to find other such repetitions?
I tried using the following string:
(?s)(.{200}).*?\1
This is what I understand it to mean (roughly):
(?s): search across paragraphs
(.{200}).*?: search for, and capture, a string of 200 characters,
optionally followed by any characters
\1: stop the search as soon as you reach a second instance of the captured
string
The string does what I need if I replace 200 with a shorter number, such as
10 (but in this case BBEdit finds a lot of unproblematic repetitions, of
course). Given that the sentence I have in mind is more than 300 characters
long I should even have been able to use 300 instead of just 200.
Unfortunately, however, something seems to be amiss: BBEdit kept on
searching and searching, without finding anything, and my notebook started
fanning, and after about 20 minutes it became clear that nothing would
happen, and that I cannot do anything else but to Force Quit BBEdit.
So my question is, what's wrong with the above string? How else can I find
a repeated 200-character sentence in a large text file?
Thanks
Sam
--
This is the BBEdit Talk public discussion group. If you have a feature request
or need technical support, please email "[email protected]" rather than
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bbedit/b068a68d-28c7-44af-8994-7c3424ed0befn%40googlegroups.com.