On Wed, Feb 21, 2007 at 02:20:46PM -0800, Ralph Shumaker wrote:
> Chris Grau wrote:
> >Fixing the line wrap is easy enough:
> >
> >   perl -lp0e 's{(?<!\n)\n}{ }xmsg' in.txt > out.txt
> >
> >This says, "replace any newline that doesn't immediately follow a
> >newline with a space."  It does have a drawback.  Each line in the
> >output file has a single space at the end.
> > 
> >
> 
> How would you say "replace any newline that is not preceded nor
> followed by a newline"?  Of course, I suppose you could just run a
> second command to replace all " \n" with "\n".

Untested, but I'd start with,

    s{(?<!\n)\n(?!\n)}{ }xmsg

That's a negative look-behind and a negative look-ahead.  I'm not sure
it would work, but it would be where I'd start my tinkering.

> In the command you give, what is "-lp0e"?  Yes, yes, I know they are
> switches, but what do they?  How does the whole line read in standard
> SN (Stremler Notation®)?  e.g.:
> 
> perl   the command
> -      begin the switches
> l      does this
> p      does that
> 0      does yet
> e      does another

The switches are documented in the perlrun(1) man page.

-l  append a newline to every line printed
-p  assume a while/print loop around the code
-0  specify the input record separator; normally this is \n, but without
    a value, uses nul, so the entire input is slurped at once
-e  the code to execute


> '      begin the command string
> s      the substitution command
> {      begin the search set
> ...      etc.

This part is a bit more fun.  I won't break it down character by
character.

s        perform a substitution (in this case, on $_)
{        start the match (by default, uses /, but i like brackets)
(?<!\n)  negative look-behind; make sure a newline does not precede what
         is being matched; this will not be part of the final match
\n       the newline we want to match
}        the end of the match
{ }      the replacement, in this case a single space
x        re flag: extended; whitespace and comments are not significant
m        re flag: multiline; ^$ match lines, not begin/end of string
s        re flag: single line: . matches \n now
g        re flag: global replace

Strictly speaking, neither the x or s were required, but I'm in the
habit of always using them these days.

> >Does it matter if, after replacing "line that is," that the existing
> >position of the newline is preserved?  If not:
> >
> >   perl -p0e 's{line\sthat\sis}{line which is}xmsg' in.txt
> > 
> >
> 
> I don't know what "line that is," is referring to.

The literal string from your original example.  I've snipped it from
this response, but I originally took it from your message on 11 Feb with
the subject "vi, regex, and line wrapping."  (Message ID:
[EMAIL PROTECTED]).

>                                                     And I don't know 
> what you mean by "that the existing position of the newline is
> preserved", although my guess is that you're asking if I want to be
> able to later put back the newlines that have been stripped out.  If
> so, then no, I don't care about remembering where they were.

Yes, that's what I meant.

> >Or, you could do stuff with capturing the whitespace that was there,
> >if you really wanted:
> >
> >   perl -p0e 's{line(\s)that(\s)is}{line${1}which${2}is}xmsg' in.txt
> > 
> >
> 
> Wow, this is really good stuff about perl one liners.  I know just
> enough perl to understand it (or to be able to deduce some of the
> parts I don't know).

Or get yourself into a whole lot of trouble.  :)

>                       (I'm assuming that \s stands for any whitespace
> (" ", \t, or otherwise).)

Correct.

> I've already tackled this particular problem though, using sed and tr.
> I used sed to replace all "$" (EOL) with "`" (since the document did
> not contain any of that character and it didn't seem to be a special
> character needing a "\").  Then I used tr to strip out all "\n".  Then
> I used sed to replace the last "`" with "", all "``" with "\n\n", and
> subsequently, all "`" with " ".  I've used sed for everything so far,
> except for the onetime use of tr.

Yes, that's a good solution, too.  I simply threw the Perl solution out
there for fun.

> It's nice to have a script doing everything from beginning to end
> because when the script is done, it will show everything that was
> done.  No fading memory forgetting what all I've done and how I did
> it.

Be sure to document!  :)

-- 
Chris Grau

Attachment: pgpRYxxWQEzsF.pgp
Description: PGP signature

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to