James G. Sack (jim) wrote:
> John H. Robinson, IV wrote:
>> Ralph Shumaker wrote:
>>> I've been compiling my script, using sed to do everything I was 
>>> previously doing in vim. However, I've hit a snag. One thing that works 
>>> in vim does *not* in sed.
>>>
>>> vim would strip out all unwanted line feeds with:
>>> ":%s/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/cg"
>>>
>>> In my script,
>>> "sed -e 's/\([ a-zA-Z0-9,\.:;?!)?-]\)\n\([A-Z^a-z(]\)/\1 \2/g' 0035 >0036"
>>> doesn't change anything, so (as a test) I reduced it down to match one 
>>> line in particular:
>>> "sed -e 's/e\nu/e u/g' 0035"
>>> and still no go. But reducing it to:
>>> "sed -e 's/e$/eeeeeee/g' 0035"
>>> or
>>> "sed -e 's/^u/uuuuuuu/g' 0035"
>>> works (except that it does nothing to the newline).
>>>
>>> Any suggestions?
>> Almost sounds like a job for perl. I will have to go back to the
>> original problem to see if a nice, clean perl one-liner can tend to
>> this.
>>
> 
> I'm sure jhriv will come up w/ a one-liner for you, but I just wanted to
> remark that your use of \. within character class brackets is not
> required. If vim requires it, then it's vim that's broke.
> 
> The usual language is that '.' has no special meaning within brackets.
> 
> It's useful to dwell on this a bit -- spending a few minutes here makes
> regular expressions a little less intimidating.
> 
> Normally the dot character stands for _any character_ so you can see
> that it wouldn't make much sense to define a character class list that
> contains such a wild card. If '.' means anything, then any other content
> is superfluous!
> 
> For further thought, the only specials within brackets ought to be '-'
> (for ranges) and ']' (the end-delimiter for the character class).
> Then you kinda have to add '\' to the specials so that you can write
> '\]' to mean a literal ']'. You can also use '\-' and, of course '\\'.
> By convention, putting the '-' as the first or last character in the
> brackets also means a literal '-'. I suppose putting ']' as the first
> character might logically also mean a literal ']' (since otherwise you
> have an empty class -- maybe it actually works that way?. Arguably,
> using '\-', '\]' is better than remembering additional conventions, but
> I mention it so that you will recognize them when you see them.
> 

I should have also mentioned constructs such as '\xHH' for hex-values
(and relatives for octal or decimal), and then other conventional names
of non-printing characters, eg, '\n'.

One problem is that different applications have different names and
conventions. That adds some extra challenge to regex use <sigh>.

..j


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to