RE: RegEx (again!)

Pascal Peters Fri, 03 Dec 2004 01:09:13 -0800

This is something you can't do (easily) with regexp. The text means
anything except the characters < / s p a n >. Remember that that []
matches a SINGLE character. If you want to remove empty span tags, do it
in 4 steps:
1. find empty span tag (no regexp needed)
2. remove it
3. find matching end tag (note that span tags can be nested, so it is
not necessarily the next </span>)
4. remove it


<cfscript>
function RemoveSpan(text){
        var start1 = 1;
        var start2 = 1;
        var pos = 0;
        var stTmp = StructNew();
        var cnt = 0;
        while(true){
                // find empty span
                pos = FindNoCase("<span>",text,start1);
                if(NOT pos) break;
                // remove empty span tag
                text = RemoveChars(text,pos,Len("<span>"));
                // find matching end tag
                cnt = 1;
                start2 = pos;
                start1 = pos;
                while(true){
                        stTmp =
REFindNoCase("</?span[^>]*>",text,start2,true);
                        if(NOT stTmp.pos[1]) break;
                        start2 = stTmp.pos[1] + stTmp.len[1];
                        if(Mid(text,stTmp.pos[1]+1,1) IS "/") cnt = cnt
- 1;
                        else cnt = cnt + 1;
                        if(cnt IS 0){
                                // remove matching end tag
                                text =
RemoveChars(text,stTmp.pos[1],stTmp.len[1]);
                                break;
                        }
                }
        }
        return text;
}
</cfscript>

Pascal

> -----Original Message-----
> From: Duane Boudreau [mailto:[EMAIL PROTECTED]
> Sent: 03 December 2004 01:58
> To: CF-Talk
> Subject: RegEx (again!)
> 
> I thought I was finally catching on to this regex stuff but I guess
not.
> 
> I am using a tag from the developer exchange that strips a bunch of
extra
> html that Word inserts into an HTML document. The text left over
leaves a
> lot of extra <span>xyx</span> formatting. Since the opening <span> is
> empty
> I would like to remove them from the text.  I tried this but no luck.
> 
> textString = "<P><SPAN><b>Duane
> Boudreau</b></SPAN></P><P><SPAN>[EMAIL PROTECTED]</SPAN></P>";
> 
> textString = reReplaceNoCase(textString, "<span>([^</span>]*)</span>",
> "\1",
> "ALL");
> 
> I thought that the text between the () meant all text except </span>
> between
> the opening string "<span>" and the next "</span>"
> 
> What am I doing wrong here?
> 
> Thanks,
> Duane
> 
> 
> 
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Special thanks to the CF Community Suite Gold Sponsor - CFHosting.net
http://www.cfhosting.net

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:186056
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

RE: RegEx (again!)

Reply via email to