Re: [Linux-users] help wanted with sed usage

Jim Cheetham Sat, 02 Jul 2011 00:51:00 -0700

On Fri, Jul 1, 2011 at 5:11 PM, Kent Fredric <[email protected]> wrote:
> Goofed around and read some man pages, discovered the "branch" "command".


Yes, interesting, isn't it? For years I've used sed as a 'simple' line
by line transformer, and moved over to awk for more complex things
(and then perl when that became available in common base builds), but
sed certainly seems to have enough in it to do much more than I
expected.

> abcdefghi -> abcdefghiZ  may also be considered "wrong" , but thats
> hard to fix without massive changes.

Well, Glenn's description of his real-world problem suggests that it
is correct -- take a "n" character field, and make it "n+1" by
appending a marker. Therefore that line is correct.

However the earlier "abcd -> abcdZ" is certainly wrong. But because it
is only wrong in one case, we can fix that up with a simple clause at
the end -- s/^(.{4}Z/\1/

I had a look at my initial linear solution, which goes wrong in more
cases than Joe's :-

                       abcd -> abcdZ
                     abcde -> abcdeZ
                    abcdef -> abcdeZf
                   abcdefg -> abcdeZfg
                  abcdefgh -> abcdefghZ

It is of course arguable that we don't need to be correct in the face
of invalid input like this, because we're just a simple sed
invocation. But we're already a long way from Glenn's original input
data when I started to worry about variable length input, so we might
as well see how far we can stretch sed ...

In my 4-step solution starting with "insert Z every 4th character" the
second clause tries to clean up this early Z; a small modification
there to make shorter cleanups possible by making the group after the
first Z optional ... s/^(.{4})Z(.{4})?/\1\2/;

abcd => abcd
abcde => abcde
abcdef => abcdef
abcdefg => abcdefg
abcdefgh => abcdefghZ
abcdefghi => abcdefghiZ

So now I have only one broken case, which is "abcdefghZ". I already
have a fixup looking for Z...Z$, so I can extend that to look for the
same ...Z$ but preceeded by either Z or ^.{5} ...
s/(^.{5}|Z)(.{3})Z$/\1\2/;

's/(.{4})/\1Z/g;' 's/^(.{4})Z(.{4})?/\1\2/;' 's/Z(.)/\1Z/g;'
's/(^.{5}|Z)(.{3})Z$/\1\2/;'

                         a => a
                        ab => ab
                       abc => abc
                      abcd => abcd
                     abcde => abcde
                    abcdef => abcdef
                   abcdefg => abcdefg
                  abcdefgh => abcdefgh
                 abcdefghi => abcdefghiZ
                abcdefghij => abcdefghiZj
               abcdefghijk => abcdefghiZjk
              abcdefghijkl => abcdefghiZjkl
             abcdefghijklm => abcdefghiZjklmZ
            abcdefghijklmn => abcdefghiZjklmZn
           abcdefghijklmno => abcdefghiZjklmZno
          abcdefghijklmnop => abcdefghiZjklmZnop
         abcdefghijklmnopq => abcdefghiZjklmZnopqZ
        abcdefghijklmnopqr => abcdefghiZjklmZnopqZr
       abcdefghijklmnopqrs => abcdefghiZjklmZnopqZrs
      abcdefghijklmnopqrst => abcdefghiZjklmZnopqZrst
     abcdefghijklmnopqrstu => abcdefghiZjklmZnopqZrstuZ
    abcdefghijklmnopqrstuv => abcdefghiZjklmZnopqZrstuZv
   abcdefghijklmnopqrstuvw => abcdefghiZjklmZnopqZrstuZvw
  abcdefghijklmnopqrstuvwx => abcdefghiZjklmZnopqZrstuZvwx
 abcdefghijklmnopqrstuvwxy => abcdefghiZjklmZnopqZrstuZvwxyZ
abcdefghijklmnopqrstuvwxyz => abcdefghiZjklmZnopqZrstuZvwxyZz



> (there's a magic property of
> this problem that will only work as long as the "first" value is 5 and
> the "Second" value is 4.  Adjust either of these by 1 and it injects
> Z's in the characters that have been moved to the right hand end.

Really? I would have thought it was more likely to be based on the
prefix being longer than the subsequent fields; but I'm beginning to
lose enthusiasm for testing more! Mostly because I have 'magic
numbers' in the sed groups, and it's not clear even to me what they
represent!

That in itself is surely an argument for using a higher level
language, even if it is just to get the ability to construct regexp
objects in a clearly readable fashion.

> And thats hard to solve without temporary variable storage or
> lookaround ( the latter of which is not supported by POSIX sed as far
> as I can tell )

sed is indeed one of the more limited regexp engines, but as Bryce
pointed out, it does have some other interesting language features,
like the pattern/hold space for temporary storage.

-jim

_______________________________________________
Linux-users mailing list
[email protected]
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users

Re: [Linux-users] help wanted with sed usage

Reply via email to