Oops sorry, wrong tree, barking. Bloody obvious really! -----Original Message----- From: Giles Roadnight [mailto:[EMAIL PROTECTED] Sent: 18 March 2004 09:51 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] OT: Regular Epressions
If I was generating the html page I wouldn't be using regex at all - I'd already have the data. I am parsing webpages produced by a program to get at the data and put it in a DB. Giles Roadnight http://giles.roadnight.name -----Original Message----- From: Damian Watson [mailto:[EMAIL PROTECTED] Sent: 18 March 2004 09:42 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] OT: Regular Epressions Why don't you make each relevant tr something like <tr class="meeting"> which means you can identify each required row more easily... (and each td item within that that is required should have a class so you can say it's there). My regex isn't good enough to give you any example! -----Original Message----- From: Giles Roadnight [mailto:[EMAIL PROTECTED] Sent: 18 March 2004 09:31 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] OT: Regular Epressions Although having said that getting the registered drivers link isve the easy - I can manage that my self. Giles Roadnight http://giles.roadnight.name -----Original Message----- From: Giles Roadnight [mailto:[EMAIL PROTECTED] Sent: 18 March 2004 09:30 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] OT: Regular Epressions Thanks for the post Paul. I did try <tr[^>]*>.*</tr> but the middle bit matches anything - including </tr> so I get the whole of the rest of the page. I don't actually want any of the row returned - I just want to make sure that this row is in the correct format (i.e. has 3 cells with meeting, date and venue in) so that I can start looping through the remaining rows in the table to get what I want. Can anyone else help with this? I have attached the page I am working with (if this list allows attachments) what I want to do is get the address of each meeting index file (in this case there is only 1 at mtg11/index.htm, the file name of the registered drivers page and the file name of the series file (in this case ser1/series.htm). Thanks Giles Roadnight http://giles.roadnight.name -----Original Message----- From: Paul Johnston [mailto:[EMAIL PROTECTED] Sent: 18 March 2004 09:20 To: [EMAIL PROTECTED] Subject: Re: [ cf-dev ] OT: Regular Epressions Giles, > I want to be able to find a certain row in an html document. To do this > I need to pad the spaces where I don't know what will be there (<font, > <strong tags ect with [^somecharacter]* but I don't know what character > I shoul dues. Really I want to say [^</tr]* but that means not < or / > ect which doesn't sowk as the <font tags also have </font tags. I'm quite confused by this! It's not entirely clear what you want to do, so let's try and figure this out! I am assuming that by a row, you mean the bits between a <tr> and a </tr>. So, you are trying to find: 1) <tr> although it may have attributes so it would be <tr[^>]*> 2) anything that isn't a </tr> 3) </tr> the first and last bits are easy so the regex can begin to take shape: <tr[^>]*>[[2]]</tr> It's just the [[2]] bit that we're now interested in! And it's a lot simpler than you may think. Remember that a regex is going to search for the WHOLE string, not just look for the next bit of itself. And the regex knows that the string ends in </tr> so it will look for something starting with a <tr> tag and ending with a </tr> tag won't it! In other words this should work (untested): <tr[^>]*>.*</tr> But, the point is, do you need the [[2]] bit? With a regular expression you are finding the start of the string, and not the actual string itself. To return the string, you need to find: 1) the end of the <tr> tag + 1 2) the start of the closing </tr> tag - do a find on </tr> using a start position of (1) 3) Do a Mid on the string with those values So instead of a regex, it becomes (htmlstr is the string you are working on): <cfscript> // find end of opening <tr> start = Find(">", htmlstr, find("<tr", htmlstr)) + 1; // find ending </tr> end = find("</tr>", htmlstr, start); trstring = mid(htmlstr, start, end - start); </cfscript> and out pops trstring! This could easily just be a one liner too! No need for a regex anywhere (although you can use the above instead of the start equation)! Remember though this will return what is INSIDE the tag, and not the actual tag itself. To do that, find the start of the <tr and the end of the </tr> and it will pop out! Paul -- These lists are syncronised with the CFDeveloper forum at http://forum.cfdeveloper.co.uk/ Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ CFDeveloper Sponsors and contributors:- *Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by activepdf.com* *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com* *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com* To unsubscribe, e-mail: [EMAIL PROTECTED] -- These lists are syncronised with the CFDeveloper forum at http://forum.cfdeveloper.co.uk/ Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ CFDeveloper Sponsors and contributors:- *Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by activepdf.com* *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com* *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com* To unsubscribe, e-mail: [EMAIL PROTECTED] -- These lists are syncronised with the CFDeveloper forum at http://forum.cfdeveloper.co.uk/ Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ CFDeveloper Sponsors and contributors:- *Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by activepdf.com* *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com* *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com* To unsubscribe, e-mail: [EMAIL PROTECTED] -- These lists are syncronised with the CFDeveloper forum at http://forum.cfdeveloper.co.uk/ Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ CFDeveloper Sponsors and contributors:- *Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by activepdf.com* *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com* *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com* To unsubscribe, e-mail: [EMAIL PROTECTED] -- These lists are syncronised with the CFDeveloper forum at http://forum.cfdeveloper.co.uk/ Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ CFDeveloper Sponsors and contributors:- *Hosting and support provided by CFMXhosting.co.uk* :: *ActivePDF provided by activepdf.com* *Forums provided by fusetalk.com* :: *ProWorkFlow provided by proworkflow.com* *Tutorials provided by helmguru.com* :: *Lists hosted by gradwell.com* To unsubscribe, e-mail: [EMAIL PROTECTED]