On Sat, Sep 02, 2000 at 05:27:49PM -0400, Bob Bernstein wrote:
> erik <[EMAIL PROTECTED]> wrote:
> 
> > > ##  Use STDIN if no files are given
> > > $ARGV[0] = "-" unless @ARGV;
> > > 
> > > ##  Strip out anything contained in an SGML markup tag.  This is not
> > > ##  very pretty and rather inefficient, but it does take care of tags
> > > ##  which cross line or paragraph boundaries.
> > > foreach $file (@ARGV) {
> > >   open(INPUT,$file);
        # while there's text to get
        while(<INPUT>) {
                # while there's a starting (maybe complete) tag
                while (s/<[^>]*(>?)//) {
                        # if not complete (<start but no finish)
                        if ( ! $1) {
                                my $tag;
                                while($tag = <INPUT>) {
                                        # keep going until we find the 
end-of-tag>
                                        last if $tag =~ s/.*?>//;
                                }
                                # maybe add a space wherever tags were ripped 
out? up 2 u
                                $_ .= $tag;
                        }
                }
                munge $_;
        }

note -- this ain't tested, but it looks to me like it's workable;
plus it reads lines at a time and uses the powerful perl muscles
to help you do your job... of course, tmtowtdi...

> I had trouble with your idea, but I went back to the original script I posted
> and discovered that the problem is it dies whenever a numerical '0' is
> encountered! Apart from that it works fine. It just so happened I had a '0' in
> the first few lines of my SGML, but I didn't get the implication.
> 
> So zero makes the condition '$char = getc(INPUT)' evaluate to false, dumping
> the flow down to closing the file. What's the perl equivalent of WHILE NOT
> EOF? <g>

        while (<FILEHANDLE>) { ... }
i.e.
        while ($_ = <FILEHANDLE>) { munge $_; }


Reply via email to