Grep question - matching duplicate blocks of lines

Todd Ruston Mon, 05 Mar 2007 07:04:11 -0800

Greetings,

I'm trying to figure out a grep pattern to search for blocks of
contiguous duplicate lines and eliminate the duplicate. Example:


   text A
   text A
   text 1
   text 2
   text 3
   text 1
   text 2
   text 3
   text I
   text A

After processing, the file should read

   text A
   text 1
   text 2
   text 3
   text I
   text A

The actual text is hard wrapped paragraphs, and the lines vary from
~5 to 90 characters long (or thereabouts). There is white space (3
or more spaces, as illustrated above) at the beginning of populated
lines. There can be blank lines (just a return) or white space lines
(lines with only space characters terminated by a return), but they
should be considered of equal status as populated lines (i.e. when
identifying duplicate blocks, white space is consistent and
important).

My current attempt is:

(?s)(^.+)\1

but that matches consecutive characters at the beginning of lines in
addition to desired duplicate blocks. Is there a modification to this
(or another approach) that could make it only compare complete
lines? Thanks for any assistance you can offer.

- Todd

P.S. Anyone know when ListSearch will be back online?

--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Grep question - matching duplicate blocks of lines

Reply via email to