On Fri, Jul 1, 2011 at 5:11 PM, Kent Fredric <[email protected]> wrote:
> Goofed around and read some man pages, discovered the "branch" "command".
Yes, interesting, isn't it? For years I've used sed as a 'simple' line
by line transformer, and moved over to awk for more complex things
(and then perl when that became available in common base builds), but
sed certainly seems to have enough in it to do much more than I
expected.
> abcdefghi -> abcdefghiZ may also be considered "wrong" , but thats
> hard to fix without massive changes.
Well, Glenn's description of his real-world problem suggests that it
is correct -- take a "n" character field, and make it "n+1" by
appending a marker. Therefore that line is correct.
However the earlier "abcd -> abcdZ" is certainly wrong. But because it
is only wrong in one case, we can fix that up with a simple clause at
the end -- s/^(.{4}Z/\1/
I had a look at my initial linear solution, which goes wrong in more
cases than Joe's :-
abcd -> abcdZ
abcde -> abcdeZ
abcdef -> abcdeZf
abcdefg -> abcdeZfg
abcdefgh -> abcdefghZ
It is of course arguable that we don't need to be correct in the face
of invalid input like this, because we're just a simple sed
invocation. But we're already a long way from Glenn's original input
data when I started to worry about variable length input, so we might
as well see how far we can stretch sed ...
In my 4-step solution starting with "insert Z every 4th character" the
second clause tries to clean up this early Z; a small modification
there to make shorter cleanups possible by making the group after the
first Z optional ... s/^(.{4})Z(.{4})?/\1\2/;
abcd => abcd
abcde => abcde
abcdef => abcdef
abcdefg => abcdefg
abcdefgh => abcdefghZ
abcdefghi => abcdefghiZ
So now I have only one broken case, which is "abcdefghZ". I already
have a fixup looking for Z...Z$, so I can extend that to look for the
same ...Z$ but preceeded by either Z or ^.{5} ...
s/(^.{5}|Z)(.{3})Z$/\1\2/;
's/(.{4})/\1Z/g;' 's/^(.{4})Z(.{4})?/\1\2/;' 's/Z(.)/\1Z/g;'
's/(^.{5}|Z)(.{3})Z$/\1\2/;'
a => a
ab => ab
abc => abc
abcd => abcd
abcde => abcde
abcdef => abcdef
abcdefg => abcdefg
abcdefgh => abcdefgh
abcdefghi => abcdefghiZ
abcdefghij => abcdefghiZj
abcdefghijk => abcdefghiZjk
abcdefghijkl => abcdefghiZjkl
abcdefghijklm => abcdefghiZjklmZ
abcdefghijklmn => abcdefghiZjklmZn
abcdefghijklmno => abcdefghiZjklmZno
abcdefghijklmnop => abcdefghiZjklmZnop
abcdefghijklmnopq => abcdefghiZjklmZnopqZ
abcdefghijklmnopqr => abcdefghiZjklmZnopqZr
abcdefghijklmnopqrs => abcdefghiZjklmZnopqZrs
abcdefghijklmnopqrst => abcdefghiZjklmZnopqZrst
abcdefghijklmnopqrstu => abcdefghiZjklmZnopqZrstuZ
abcdefghijklmnopqrstuv => abcdefghiZjklmZnopqZrstuZv
abcdefghijklmnopqrstuvw => abcdefghiZjklmZnopqZrstuZvw
abcdefghijklmnopqrstuvwx => abcdefghiZjklmZnopqZrstuZvwx
abcdefghijklmnopqrstuvwxy => abcdefghiZjklmZnopqZrstuZvwxyZ
abcdefghijklmnopqrstuvwxyz => abcdefghiZjklmZnopqZrstuZvwxyZz
> (there's a magic property of
> this problem that will only work as long as the "first" value is 5 and
> the "Second" value is 4. Adjust either of these by 1 and it injects
> Z's in the characters that have been moved to the right hand end.
Really? I would have thought it was more likely to be based on the
prefix being longer than the subsequent fields; but I'm beginning to
lose enthusiasm for testing more! Mostly because I have 'magic
numbers' in the sed groups, and it's not clear even to me what they
represent!
That in itself is surely an argument for using a higher level
language, even if it is just to get the ability to construct regexp
objects in a clearly readable fashion.
> And thats hard to solve without temporary variable storage or
> lookaround ( the latter of which is not supported by POSIX sed as far
> as I can tell )
sed is indeed one of the more limited regexp engines, but as Bryce
pointed out, it does have some other interesting language features,
like the pattern/hold space for temporary storage.
-jim
_______________________________________________
Linux-users mailing list
[email protected]
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users