James G. Sack (jim) wrote:

James G. Sack (jim) wrote:
John H. Robinson, IV wrote:
Ralph Shumaker wrote:
I've been compiling my script, using sed to do everything I was previously doing in vim. However, I've hit a snag. One thing that works in vim does *not* in sed.

vim would strip out all unwanted line feeds with:
":%s/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/cg"

In my script,
"sed -e 's/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/g' 0035 >0036"
doesn't change anything, so (as a test) I reduced it down to match one line in particular:
"sed -e 's/e\nu/e u/g' 0035"
and still no go. But reducing it to:
"sed -e 's/e$/eeeeeee/g' 0035"
or
"sed -e 's/^u/uuuuuuu/g' 0035"
works (except that it does nothing to the newline).

Any suggestions?
Almost sounds like a job for perl. I will have to go back to the
original problem to see if a nice, clean perl one-liner can tend to
this.

I'm sure jhriv will come up w/ a one-liner for you, but I just wanted to
remark that your use of \. within character class brackets is not
required. If vim requires it, then it's vim that's broke.

The usual language is that '.' has no special meaning within brackets.

It's useful to dwell on this a bit -- spending a few minutes here makes
regular expressions a little less intimidating.

Normally the dot character stands for _any character_ so you can see
that it wouldn't make much sense to define a character class list that
contains such a wild card. If '.' means anything, then any other content
is superfluous!

For further thought, the only specials within brackets ought to be '-'
(for ranges) and ']' (the end-delimiter for the character class).
Then you kinda have to add '\' to the specials so that you can write
'\]' to mean a literal ']'. You can also use '\-' and, of course '\\'.
By convention, putting the '-' as the first or last character in the
brackets also means a literal '-'. I suppose putting ']' as the first
character might logically also mean a literal ']' (since otherwise you
have an empty class -- maybe it actually works that way?. Arguably,
using '\-', '\]' is better than remembering additional conventions, but
I mention it so that you will recognize them when you see them.


I should have also mentioned constructs such as '\xHH' for hex-values
(and relatives for octal or decimal), and then other conventional names
of non-printing characters, eg, '\n'.

One problem is that different applications have different names and
conventions. That adds some extra challenge to regex use <sigh>.

Yeah, I've run into that. But fortunately, for the moment anyway, I'm only dealing with one application, like "What I *sed*" in the subject line. :-)


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to