On Wed, 22 Jan 2003, Rob Dixon wrote:
> Hi George. I think you'd have had an answer by now if there was
> one. I can't think of anything but I wasn't willing to post and say
> 'it can't be done' without waiting for others' ideas.
>
> George P. wrote:
> > But now, I need to check for all classes other than "text";
> >
> > This has me stumped!!
> >
> > For eg:
> > $str = '<TD class="text1">';
> >
> > $class = 'text[0-9]+'
> > if ($str =~ /class="$class"/)
> > {
> > print "TAG has this class\n";
> > }
>
> I still don't understand exactly why you cant use
>
> if ($str !~ /class="$class"/) {
> print "TAG doesn't have this class\n";
> }
>
> or even something ugly like
>
> if ($str =~ /class="$class"/) { }
> else {
> print "TAG doesn't have this class\n";
> }
>
> If you can describe to us a circumstance where you 'need' this
> functionality I'm sure we'll come up with an answer.
>
I'll explain what I'm trying to do???
I'm writing a program that will parse an HTML file.
This html file contains a text article that has been placed in
between certain tags like (<TD>) which have a specific class name.
So you can have something like
<TABLE>
<TR>
<TD class="articletext">
This article is just an example.
</TD>
</TR>
</TABLE>
And, I have to pick "This article is just an example" from that
file.
What class name to pick differs in different files. So, although
I have to pick all text within a TD tag having class name
"articletext" for the previous example. I might have to pick
all text within a SPAN tag having class name "anotherarticletext"
in another HTML file.
What class name to pick is decided by what file I'm parsing.
So, what did I do??
I created a map file. This map file will contain the filename,
and the tag-class combination which I have to pick.
I then read the file, and checked if it has that tag-class
combination. If it does I get the text that falls within
that tag.
Assuming, $str contains a tag specification.
$str = "<TD class='articletext'>";
In order to check if that tag-class combination exists.
I simply do:
if ($str =~ /<$tag class='$class'>/i)
{ # Take the text }
else
{ # Don't take the text }
This code helps a lot when I want to pick up a specific type
of class, like all those classes which start with the word
"text" and have a number following it.
This way the class name given in my map file will be "text[0-9]+"
Other than this, I wanted to also remove a few tags-class
combination that come in between the tags that I want to pick up.
Eg:
<TD class="articletext">
This text has to be picked up
<SPAN class="removetext">
This text has to be ignored
</SPAN>
This text has to also be picked up.
</TD>
So I wrote a similiar code to find those tags that I want to
remove, and if that tag-class combination matches, I ignore
them.
This code works fine, when you give proper classnames, and also
works for regex class names like "text[0-9]+"
But now, one more situation arose. I want to remove all classes
other than the pick-up class.
So, if I'm picking up text from class "articletext", I want to
remove all classes other than "articletext".
I wanted to use the current code setup, just change the removing
class name to something like "[^(articletext)]" , and expect it to
remove all classes other than "articletext", but this cannot happen.
I think I'll just add one more parameter in the map file, which
will tell me when to use "=~" and when to use "!~".
Thanks for your help.
bye,
George .
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]