Another "occasional user" question - with not enough time to learn all the 
cool tools bbedit and regex offer that would solve my problem. 
context: I have about 500 KB of text (as 43 .txt files) video transcripts 
to edit/refine. All files have timecode removed.

I want to find all instances of doubled words, but omit/ignore a subset of 
those matches, i.e., search for doubled words in a video transcript, but 
*EXCLUDE 
"many, many" and "very, very"*. In effect, this will reduce instances of 
stuttering in a video transcript, but leave the intentional repeats intact. 

This search string finds doubled words separated by a comma and a space, 
which satisfies most of the instances of doubled words:
(\b[A-Za-z]+\b),\s\1 

replace with
\1 
e.g., find "*what, what*" and replace with "*what*" in the string:
*So when we talk about the structure of data that describes what, what 
identifies our columns*

But do not replace "*very, very*" in the string:
*or even the greater distance away from zero, is **very, very **small. *


Thank you for any hints on doing this.

Glenn

 

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/e1806b72-be8d-4cf4-a263-148b9214828bn%40googlegroups.com.

Reply via email to