RE: [ cf-dev ] OT: Regular Epressions

Damian Watson Thu, 18 Mar 2004 01:58:53 -0800

Oops sorry, wrong tree, barking. Bloody obvious really!

-----Original Message-----
From: Giles Roadnight [mailto:[EMAIL PROTECTED] 
Sent: 18 March 2004 09:51
To: [EMAIL PROTECTED]
Subject: RE: [ cf-dev ] OT: Regular Epressions


If I was generating the html page I wouldn't be using regex at all - I'd
already have the data.

I am parsing webpages produced by a program to get at the data and put
it in a DB.

Giles Roadnight
http://giles.roadnight.name


-----Original Message-----
From: Damian Watson [mailto:[EMAIL PROTECTED] 
Sent: 18 March 2004 09:42
To: [EMAIL PROTECTED]
Subject: RE: [ cf-dev ] OT: Regular Epressions

Why don't you make each relevant tr something like <tr class="meeting">
which means you can identify each required row more easily... (and each
td item within that that is required should have a class so you can say
it's there).

My regex isn't good enough to give you any example!

-----Original Message-----
From: Giles Roadnight [mailto:[EMAIL PROTECTED] 
Sent: 18 March 2004 09:31
To: [EMAIL PROTECTED]
Subject: RE: [ cf-dev ] OT: Regular Epressions

Although having said that getting the registered drivers link isve the
easy -
I can manage that my self.

Giles Roadnight
http://giles.roadnight.name


-----Original Message-----
From: Giles Roadnight [mailto:[EMAIL PROTECTED] 
Sent: 18 March 2004 09:30
To: [EMAIL PROTECTED]
Subject: RE: [ cf-dev ] OT: Regular Epressions

Thanks for the post Paul. I did try <tr[^>]*>.*</tr> but the middle bit
matches anything - including </tr> so I get the whole of the rest of the
page.

I don't actually want any of the row returned - I just want to make sure
that this row is in the correct format (i.e. has 3 cells with meeting,
date and venue in) so that I can start looping through the remaining
rows in the table to get what I want.

Can anyone else help with this? I have attached the page I am working
with (if this list allows attachments) what I want to do is get the
address of each meeting index file (in this case there is only 1 at
mtg11/index.htm, the file name of the registered drivers page and the
file name of the series file (in this case ser1/series.htm).

Thanks

Giles Roadnight
http://giles.roadnight.name


-----Original Message-----
From: Paul Johnston [mailto:[EMAIL PROTECTED] 
Sent: 18 March 2004 09:20
To: [EMAIL PROTECTED]
Subject: Re: [ cf-dev ] OT: Regular Epressions

Giles,

> I want to be able to find a certain row in an html document. To do
this
> I need to pad the spaces where I don't know what will be there (<font,
> <strong tags ect with [^somecharacter]* but I don't know what
character
> I shoul dues. Really I want to say [^</tr]* but that means not < or /
> ect which doesn't sowk as the <font tags also have </font tags.

I'm quite confused by this!  It's not entirely clear what you want to 
do, so let's try and figure this out!

I am assuming that by a row, you mean the bits between a <tr> and a
</tr>.

So, you are trying to find:

1) <tr> although it may have attributes so it would be <tr[^>]*>
2) anything that isn't a </tr>
3) </tr>

the first and last bits are easy so the regex can begin to take shape:

<tr[^>]*>[[2]]</tr>

It's just the [[2]] bit that we're now interested in!  And it's a lot 
simpler than you may think.  Remember that a regex is going to search 
for the WHOLE string, not just look for the next bit of itself. And the 
regex knows that the string ends in </tr> so it will look for something 
starting with a <tr> tag and ending with a </tr> tag won't it!  In other

words this should work (untested):

<tr[^>]*>.*</tr>

But, the point is, do you need the [[2]] bit?  With a regular expression

you are finding the start of the string, and not the actual string 
itself. To return the string, you need to find:

1) the end of the <tr> tag + 1
2) the start of the closing </tr> tag - do a find on </tr> using a start

position of (1)
3) Do a Mid on the string with those values

So instead of a regex, it becomes (htmlstr is the string you are working

on):

<cfscript>
        // find end of opening <tr>
        start = Find(">", htmlstr, find("<tr", htmlstr)) + 1;
        // find ending </tr>
        end = find("</tr>", htmlstr, start);
        trstring = mid(htmlstr, start, end - start);
</cfscript>

and out pops trstring! This could easily just be a one liner too! No 
need for a regex anywhere (although you can use the above instead of the

start equation)!  Remember though this will return what is INSIDE the 
tag, and not the actual tag itself. To do that, find the start of the 
<tr and the end of the </tr> and it will pop out!

Paul

-- 
These lists are syncronised with the CFDeveloper forum at
http://forum.cfdeveloper.co.uk/
Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
 
CFDeveloper Sponsors and contributors:-
*Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF
provided by activepdf.com*
      *Forums provided by fusetalk.com* :: *ProWorkFlow provided by
proworkflow.com*
           *Tutorials provided by helmguru.com* :: *Lists hosted by
gradwell.com*

To unsubscribe, e-mail: [EMAIL PROTECTED]



-- 
These lists are syncronised with the CFDeveloper forum at
http://forum.cfdeveloper.co.uk/
Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
 
CFDeveloper Sponsors and contributors:-
*Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF
provided by activepdf.com*
      *Forums provided by fusetalk.com* :: *ProWorkFlow provided by
proworkflow.com*
           *Tutorials provided by helmguru.com* :: *Lists hosted by
gradwell.com*

To unsubscribe, e-mail: [EMAIL PROTECTED]


-- 
These lists are syncronised with the CFDeveloper forum at
http://forum.cfdeveloper.co.uk/
Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
 
CFDeveloper Sponsors and contributors:-
*Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF
provided by activepdf.com*
      *Forums provided by fusetalk.com* :: *ProWorkFlow provided by
proworkflow.com*
           *Tutorials provided by helmguru.com* :: *Lists hosted by
gradwell.com*

To unsubscribe, e-mail: [EMAIL PROTECTED]



-- 
These lists are syncronised with the CFDeveloper forum at
http://forum.cfdeveloper.co.uk/
Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
 
CFDeveloper Sponsors and contributors:-
*Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF
provided by activepdf.com*
      *Forums provided by fusetalk.com* :: *ProWorkFlow provided by
proworkflow.com*
           *Tutorials provided by helmguru.com* :: *Lists hosted by
gradwell.com*

To unsubscribe, e-mail: [EMAIL PROTECTED]


-- 
These lists are syncronised with the CFDeveloper forum at 
http://forum.cfdeveloper.co.uk/
Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
 
CFDeveloper Sponsors and contributors:-
*Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by 
activepdf.com*
      *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com*
           *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com*

To unsubscribe, e-mail: [EMAIL PROTECTED]

RE: [ cf-dev ] OT: Regular Epressions

Reply via email to