Chris Grau wrote:
On Wed, Feb 14, 2007 at 08:25:49AM -0800, John H. Robinson, IV wrote:
Ralph Shumaker wrote:
I've been compiling my script, using sed to do everything I was
previously doing in vim. However, I've hit a snag. One thing that
works in vim does *not* in sed.
vim would strip out all unwanted line feeds with:
":%s/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/cg"
In my script,
"sed -e 's/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/g' 0035 >0036"
doesn't change anything, so (as a test) I reduced it down to match
one line in particular:
"sed -e 's/e\nu/e u/g' 0035"
and still no go. But reducing it to:
"sed -e 's/e$/eeeeeee/g' 0035"
or
"sed -e 's/^u/uuuuuuu/g' 0035"
works (except that it does nothing to the newline).
Any suggestions?
Almost sounds like a job for perl. I will have to go back to the
original problem to see if a nice, clean perl one-liner can tend to
this.
Going back to the original:
^1 This is a line
that is broken by
the super-imposed
word wrapping.
^2 Short line.
^3 Another line.
^4 Yet another.
^5 Some lines:
they wrap; Some
lines: they don't.
Fixing the line wrap is easy enough:
perl -lp0e 's{(?<!\n)\n}{ }xmsg' in.txt > out.txt
This says, "replace any newline that doesn't immediately follow a
newline with a space." It does have a drawback. Each line in the
output file has a single space at the end.
How would you say "replace any newline that is not preceded nor followed
by a newline"? Of course, I suppose you could just run a second command
to replace all " \n" with "\n".
In the command you give, what is "-lp0e"? Yes, yes, I know they are
switches, but what do they? How does the whole line read in standard SN
(Stremler Notation®)? e.g.:
perl the command
- begin the switches
l does this
p does that
0 does yet
e does another
' begin the command string
s the substitution command
{ begin the search set
... etc.
Does it matter if, after replacing "line that is," that the existing
position of the newline is preserved? If not:
perl -p0e 's{line\sthat\sis}{line which is}xmsg' in.txt
I don't know what "line that is," is referring to. And I don't know
what you mean by "that the existing position of the newline is
preserved", although my guess is that you're asking if I want to be able
to later put back the newlines that have been stripped out. If so, then
no, I don't care about remembering where they were. After I'm done, the
newlines will need to be in different positions (*if* I even add any
back at all). The double newlines belong, but the others are
superimposed for 80 column line wrapping. As I make all my other
changes, those wrapping points will move. And since it's easier to do
searches without having to accommodate for "[ \n]", I'd rather not have
wrapping for now anyway.
This turns the first "line" into:
^1 This is a line which is broken by
the super-imposed
word wrapping.
Or, you could do stuff with capturing the whitespace that was there, if
you really wanted:
perl -p0e 's{line(\s)that(\s)is}{line${1}which${2}is}xmsg' in.txt
Wow, this is really good stuff about perl one liners. I know just
enough perl to understand it (or to be able to deduce some of the parts
I don't know). (I'm assuming that \s stands for any whitespace (" ",
\t, or otherwise).)
I've already tackled this particular problem though, using sed and tr.
I used sed to replace all "$" (EOL) with "`" (since the document did not
contain any of that character and it didn't seem to be a special
character needing a "\"). Then I used tr to strip out all "\n". Then I
used sed to replace the last "`" with "", all "``" with "\n\n", and
subsequently, all "`" with " ". I've used sed for everything so far,
except for the onetime use of tr.
It's nice to have a script doing everything from beginning to end
because when the script is done, it will show everything that was done.
No fading memory forgetting what all I've done and how I did it.
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list