Re: well, blew it... sed or perl q again.

Gary Kline Wed, 31 Dec 2008 16:57:44 -0800

On Wed, Dec 31, 2008 at 03:20:14PM -0500, Karl Vogel wrote:
> >> On Tue, 30 Dec 2008 11:31:14 -0800, 
> >> Gary Kline <[email protected]> said:
> 
> G> The problem is that there are many, _many_ embedded "<A
> G> HREF="http://whatever> Site</A> in my hundreds, or thousands, or
> G> files.  I only want to delete the "http://<junkfoo.com>" lines, _not_
> G> the other Href links.
> 
>    Use perl.  You'll want the "i" option to do case-insensitive matching,
>    plus "m" for matching that could span multiple lines; the first
>    quoted line above shows one of several places where a URL can cross
>    a line-break.
> 
>    You might want to leave the originals completely alone.  I never trust
>    programs to modify files in place:
> 
>      you% mkdir /tmp/work
>      you% find . -type f -print | xargs grep -li http://junkfoo.com > FILES
>      you% pax -rwdv -pe /tmp/work < FILES
            ^^^


                        pax is like cpio, isn't it?

                        anyway, yes, i'll ponder this.  i [mis]-spent hours
                        undoing something bizarre that my scrub.c binary did
                        to directories, turning foo and bar, (and scores
                        more)
                        into foo and foo.bar, bar and bar.bak.  the bak were 
                        the saved directories.  the foo, bar were bizarre. i
                        couldn't write/cp/mv over them.  had to carefully 
                        rm -f foo; mv foo.bar foo.... [et cetera]......

                        then i scp'd my files to two other computers.
                        (*mumcle)

> 
>    Your perl script can just read FILES and overwrite the stuff in the new
>    directory.  You'll want to slurp the entire file into memory so you catch
>    any URL that spans multiple lines.  Try the script below, it works for
>    input like this:
> 
>       This
>       <a HREF="http://junkfoo.com";>
>              Site</A> should go away too.
> 
>       And so should
>       <a HREF=
>         "http://junkfoo.com/";
>       > Site</A> this
> 
>       And finally <a HREF="http://junkfoo.com/";>Site</A> this
> 
> -- 
> Karl Vogel                      I don't speak for the USAF or my company
> 
> The average person falls asleep in seven minutes.
>                                         --item for a lull in conversation
> 
> ---------------------------------------------------------------------------
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my $URL = 'href=(.*?)"http://junkfoo.com/*";';
> my $contents;
> my $fh;
> my $infile;
> my $outfile;
> 
> while (<>) {
>     chomp;
>     $infile = $_;
> 
>     s{^./}{/tmp/};
>     $outfile = $_;
> 
>     open ($fh, "< $infile") or die "$infile";
>     $contents = do { local $/; <$fh> };
>     close ($fh);
> 
>     $contents =~ s{              # substitute ...
>                     <a(.*?)      # ... URL start
>                     $URL         # ... actual link
>                     (.*?)        # ... min # of chars including newline
>                     </a>         # ... until we end
>                   }
>                   { }gixms;      # ... with a single space
> 
>     open ($fh, "> $outfile") or die "$outfile";
>     print $fh $contents;
>     close ($fh);
> }
> 
> exit(0);
> _______________________________________________
> [email protected] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "[email protected]"

-- 
 Gary Kline  [email protected]  http://www.thought.org  Public Service Unix
        http://jottings.thought.org   http://transfinite.thought.org
    The 2.17a release of Jottings: http://jottings.thought.org/index.php

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[email protected]"

Re: well, blew it... sed or perl q again.

Reply via email to