Chris Grau wrote:

On Wed, Feb 14, 2007 at 08:25:49AM -0800, John H. Robinson, IV wrote:
Ralph Shumaker wrote:
I've been compiling my script, using sed to do everything I was
previously doing in vim. However, I've hit a snag. One thing that
works in vim does *not* in sed.

vim would strip out all unwanted line feeds with:
":%s/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/cg"

In my script,
"sed -e 's/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/g' 0035 >0036"
doesn't change anything, so (as a test) I reduced it down to match
one line in particular:
"sed -e 's/e\nu/e u/g' 0035"
and still no go. But reducing it to:
"sed -e 's/e$/eeeeeee/g' 0035"
or
"sed -e 's/^u/uuuuuuu/g' 0035"
works (except that it does nothing to the newline).

Any suggestions?
Almost sounds like a job for perl. I will have to go back to the
original problem to see if a nice, clean perl one-liner can tend to
this.

Going back to the original:

^1 This is a line
that is broken by
the super-imposed
word wrapping.

^2 Short line.

^3 Another line.

^4 Yet another.

^5 Some lines:
they wrap; Some
lines: they don't.

Fixing the line wrap is easy enough:

   perl -lp0e 's{(?<!\n)\n}{ }xmsg' in.txt > out.txt

This says, "replace any newline that doesn't immediately follow a
newline with a space."  It does have a drawback.  Each line in the
output file has a single space at the end.

How would you say "replace any newline that is not preceded nor followed by a newline"? Of course, I suppose you could just run a second command to replace all " \n" with "\n".

In the command you give, what is "-lp0e"? Yes, yes, I know they are switches, but what do they? How does the whole line read in standard SN (Stremler Notation®)? e.g.:

perl   the command
-      begin the switches
l      does this
p      does that
0      does yet
e      does another
'      begin the command string
s      the substitution command
{      begin the search set
...      etc.

Does it matter if, after replacing "line that is," that the existing
position of the newline is preserved?  If not:

   perl -p0e 's{line\sthat\sis}{line which is}xmsg' in.txt

I don't know what "line that is," is referring to. And I don't know what you mean by "that the existing position of the newline is preserved", although my guess is that you're asking if I want to be able to later put back the newlines that have been stripped out. If so, then no, I don't care about remembering where they were. After I'm done, the newlines will need to be in different positions (*if* I even add any back at all). The double newlines belong, but the others are superimposed for 80 column line wrapping. As I make all my other changes, those wrapping points will move. And since it's easier to do searches without having to accommodate for "[ \n]", I'd rather not have wrapping for now anyway.

This turns the first "line" into:

   ^1 This is a line which is broken by
   the super-imposed
   word wrapping.

Or, you could do stuff with capturing the whitespace that was there, if
you really wanted:

   perl -p0e 's{line(\s)that(\s)is}{line${1}which${2}is}xmsg' in.txt

Wow, this is really good stuff about perl one liners. I know just enough perl to understand it (or to be able to deduce some of the parts I don't know). (I'm assuming that \s stands for any whitespace (" ", \t, or otherwise).)

I've already tackled this particular problem though, using sed and tr. I used sed to replace all "$" (EOL) with "`" (since the document did not contain any of that character and it didn't seem to be a special character needing a "\"). Then I used tr to strip out all "\n". Then I used sed to replace the last "`" with "", all "``" with "\n\n", and subsequently, all "`" with " ". I've used sed for everything so far, except for the onetime use of tr.

It's nice to have a script doing everything from beginning to end because when the script is done, it will show everything that was done. No fading memory forgetting what all I've done and how I did it.


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to