Thanks for digging into the regex meaning of that second ‘+’ in '\b(\w+)+\b’.

As it turned out, the OP needed to find repeated words, not characters, so 
inserting a spacebar space for the second plus sign totally works for them.

Also, I’m not sure you’re suggesting this but at the end of your comment you’re 
talking about the pattern '\b\w*(\w+)\1\b’. That first zero or more word 
characters - \w* - won’t be captured and so won’t be in the replacement 
pattern. Is that what you meant?

Best,

    — Bruce

_bruce__van_allen__santa_cruz_ca_


> On Dec 15, 2024, at 3:50 PM, GP <[email protected]> wrote:
> 
> First with BBEdit 15.1.3 (15B62, Apple Silicon) I didn't get any error with 
> ce gm's grep find and replace.
> 
> That said, however, I found the second + is doing something in the find and 
> replace operation.
> 
> Using Howard's posted sample records test from the "Sorting multiple records 
> in a text file" for testing text. Using the Pattern Playground with the find: 
> '\b(\w+)+\1\b’ (without the quotes) and replace: \1 pattern, 7 matches were 
> found:
> 0 -> facilisis
> 1 -> is
> replacement -> is
> 
> 0 -> Underhill
> 1 -> l
> replacement -> l
> 
> 0 -> 11
> 1 -> 1
> replacement -> 1
> 
> 0 -> Afterall
> 1 -> l
> replacement -> l
> 
> 0 -> 11
> 1 -> 1
> replacement -> 1
> 
> 0 -> 22
> 1 -> 2
> replacement -> 2
> 
> 0 -> Afterall
> 1 -> l
> replacement -> l
> 
> whereas, with the find: '\b(\w+)\1\b’ (without the second + and without the 
> quotes) and same replace pattern, only 3 matches were found:
> 0 -> 11
> 1 -> 1
> replacement -> 1
> 
> 0 -> 11
> 1 -> 1
> replacement -> 1
> 
> 0 -> 22
> 1 -> 2
> replacement -> 2
> 
> According to https://regex101.com's explanation, the difference is due to the 
> capturing group workings of the (\w+)+ part of the regular expression: "A 
> repeated capturing group will only capture the last iteration." So, if I'm 
> not mistaken, the workings of (\w+)+ is equivalent to \w*(\w+) and the 
> equivalent find grep is \b\w*(\w+)\1\b . That would match any word string 
> containing zero or more word characters followed by a capturing group of one 
> or more word characters followed by a single repeat of the captured group of 
> characters. According to regex101.com's Regex Debugger there's a whole lot of 
> backtracking going on to find all the matches with the \b\w*(\w+)\1\b grep.
> On Saturday, December 14, 2024 at 3:07:35 PM UTC-8 Bruce Van Allen wrote:
> Hi, 
> 
> An example of the text and a description of what you’re trying to accomplish 
> would help. 
> 
> From your find pattern, I’m guessing you’re trying to find cases where a 
> string is followed by the same string, to be replaced by just one instance of 
> the string. 
> 
> '\b(\w+)+\1\b’ (your original - without the quotes) 
> 
> Your find pattern’s second plus sign ‘+’ isn’t doing anything, because the 
> first one, which quantifies the ‘\w’, is grabbing every consecutive 
> word/alphanumeric character including any repetitions. 
> 
> Removing that second ‘+', the find pattern '\b(\w+)\1\b’ (without the quotes) 
> will find a string of word characters followed immediately by the same 
> string, as in ‘My sentence is abcabc for defdef.’ Using your replacement 
> pattern of ‘\1’, this will become ‘My sentence is abc for def.’ 
> 
> Guessing that you’re are actually looking for duplicated WORDS, if the find 
> pattern has a spacebar space ‘ ‘ then it will find any word followed by a 
> space and then the same exact word, and the replacement will eliminate the 
> duplication. 
> 
> With find pattern '\b(\w+) \1\b’, your replacement pattern makes 'My sentence 
> is abc abc for def def.’ into 'My sentence is abc for def.’ 
> 
> If you want to find a string of word characters that matches an earlier 
> instance of the same string but separated by more than just a space, your 
> pattern may be more complicated. 
> 
> HTH and please clarify if my guesses are wrong. 
> 
> — Bruce 
> 
> _bruce__van_allen__santa_cruz_ca_ 
> 
> 
> > On Dec 14, 2024, at 1:43 PM, ce gm <[email protected]> wrote: 
> > 
> > Hello there, 
> > 
> > I am doing a GREP search on a .txt file in Bbedit on my Mac. Here are the 
> > find/replace terms: 
> > Find: \b(\w+)+\1\b 
> > Replace: \1 
> > 
> > When I input the Find term, it correctly identifies the targets in the 
> > preview (highlights them in yellow). Then, when I push Replace All, I get a 
> > pop up with Application Error Code: 12247 and nothing else. 
> > 
> > Anyone know what this means? A cursory Google search was not helpful. 
> > 
> > Thanks! 
> > 
> > -- 
> > This is the BBEdit Talk public discussion group. If you have a feature 
> > request or believe that the application isn't working correctly, please 
> > email "[email protected]" rather than posting here. Follow @bbedit on 
> > Mastodon: <https://mastodon.social/@bbedit> 
> > --- 
> > You received this message because you are subscribed to the Google Groups 
> > "BBEdit Talk" group. 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected]. 
> > To view this discussion visit 
> > https://groups.google.com/d/msgid/bbedit/c9e18d6f-f5c4-467e-9c01-fa4ffbaa5485n%40googlegroups.com.
> >  
> 
> 
> -- 
> This is the BBEdit Talk public discussion group. If you have a feature 
> request or believe that the application isn't working correctly, please email 
> "[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
> <https://mastodon.social/@bbedit>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion visit 
> https://groups.google.com/d/msgid/bbedit/72b08e6c-5ac8-478c-8f54-9baddaeb18een%40googlegroups.com.

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/FE4B0DC5-B80C-4152-BE95-C2C74E7DBD1A%40cruzio.com.

Reply via email to