Re: How can I extract cell data (content surrounded by <td></td>) from a <table> in HTML response?

Hi
the regex you are using doesnt seem correct
[^tr]
 is any character that is not 't' or not 'r' it doesnt mean not the sequence
tr.


Plus if you are getting multiple <tr> instead of 1 that you expect your
regex is probably too greedy try replacing .* constructs with .*? or modify
the regex

In any case XPath is as dependent on HTML structure as a Regex is (e.g. what
if you move to a tableless layout)


regards
deepak

On Thu, Nov 19, 2009 at 8:17 AM, rosiere <[email protected]> wrote:

>
> Hello,
>
> Thanks for your advice.
>
> I did applied case insensitive check: like this:
>
> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>
> However I still face problem. Now I capture all <tr> elements in a same
> group instead of each <tr> element.
>
> I read in my jmeter.log these informations about matching:
>
> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex =
> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
> RegexExtractor:
> Match found!
> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
> RegexExtractor:
> Template piece #0 = 1
> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
> RegexExtractor:
> Template piece #1 =
> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex
> Extractor
> result =
> <TD>....<TD>
> <TR>...</TR>
> ...
> <TR>....</TR>
> <TD>
>
>
> As for alternatives, I did want to parse a HTML with org.w3c.dom api, but
> dom methods like getElementsByTagName() are all case sensitive and may not
> be able to parse an HTML with both uppercase and lowercase tags.
>
> Besides, whenever the HTML page changes, I will have to rewrite my Java
> code
> based on dom api. So in order to minimize these unwanted effects on my Java
> code, I would still like to use regex, so that, whenever HTML structure
> changes, I need only change the regex in JMeter but not my java code that
> cosumes the extracted HTML portions.
>
>
>
> Deepak Shetty wrote:
> >
> > You should probably make the check case insensitive. but I agree with
> sebb
> > ,
> > parsing html constructs with regex is a pain and breaks quite frequently
> > regards
> > deepak
> >
> > On Wed, Nov 18, 2009 at 10:37 AM, Andre Arnold <[email protected]>
> wrote:
> >
> >> sebb schrieb:
> >> > On 18/11/2009, rosiere <[email protected]> wrote:
> >> >
> >> >>  Hello,
> >> >>
> >> >>  I found that JMeter's oro regex is somehow different from java's.
> >> >>
> >> >
> >> > Yes.
> >> >
> >> > But not all that different; and neither is particularly well suited to
> >> > this task.
> >> >
> >> > The XPath Extractor will probably be much easier to use.
> >> >
> >> >
> >>
> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#XPath_Extractor
> >> >
> >> > This was discussed on the mailing list earlier this year.
> >> >
> >> >
> >> >>  Now I need to iterate on different <tr> that matches a pattern,
> then:
> >> >>   capture all the <td> elements within each <tr> , and select the 8th
> >> and 9th
> >> >>  <td>.
> >> >>
> >> >>  Since many <tr> elements appears in the HTML response, in order to
> do
> >> this I
> >> >>  have to capture <tr> line by line without including two lines in a
> >> same
> >> >>  group:
> >> >>
> >> >>  so I should avoid capturing  continuous <tr>..</tr><tr>..</tr> into
> >> the
> >> same
> >> >>  group.
> >> >>
> >> >>  By writing (?is)<tr\sclass="tgDataLine.*1\)\" >(.*)</tr> I will
> >> capture
> >> only
> >> >>  one group that contains many real <tr> elements
> >> >>  So what should I write in the regex?
> >> >>
> >> >>
> >> If you still need a pattern to match your needs.
> >> I found that the following matches your the number you wanted and the
> >> following column value.
> >>
> >> reference: ref
> >> pattern:     (?s)<TR.+?<TD.+?>([1-9|0]+?)</TD.+?<TD.+?>(.+?)</TD>
> >> template:  $1$$2$
> >> match :     1
> >>
> >> In ref_g1 you'll find the number.
> >> In ref_g2 you'll find the following column value.
> >>
> >> To catch all the matches you need to increment a counter for the match
> >> and check wether there is another one or not.
> >>
> >> Your Testplan should look sth like this:
> >>
> >> -while controller (${__javaScript("${ref}"!="error")}  )
> >> --counter (from 1 with increment 1 for the regex match value)
> >> --Http Sampler (to get your site)
> >> ---RegEx Extractor (as shown above)
> >> --if controller( same as while controller--> ${ref}"!="error" )
> >> ---your jdbc action (use ref_g1 & ref_g2)
> >>
> >>
> >> Hope I got your problem right.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26421379.html
> Sent from the JMeter - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to