Re: How can I extract cell data (content surrounded by <td></td>) from a <table> in HTML response?

Is there any reason why you arent using XPATH?

Extractor1  = varCol8 = //table/tr[td[position()=1 and
text()='313609133']]/td[8]
Extractor2  = varCol9 = //table/tr[td[position()=1 and
text()='313609133']]/td[9]


This assumes if there is an 8th column , there will alway be a column 9 , im
not quite sure how to use the extractor to extract both columns , but you
should be able to loop through the values with an explicit counter.

regards
deepak


On Fri, Nov 20, 2009 at 9:08 AM, Deepak Shetty <[email protected]> wrote:

> Hi
> If you need JMeter to iterate over variables with a ForEach , the variable
> names must have specific forms.
>
> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#ForEach_Controller
> So if you had an array of strings
> //pseudo
> for (int i=0;i<list.length;i++) {
>     vars.put("varName_" + i,list[i] );
> }
> I cant remember offhand whether you also need varName_n=count (The total
> number), try it out
> Then you should be able to use a forEach with varName.
>
> Also you say you have an arrayList and are using
>                vars.put("responseList", responseList);
> That wont work , this method uses String, String. If you need to store
> objects you have to use vars.putObject(key, object);
>
> While working with BSH always check your jmeter.log for errors.
>
> regards
> deepak
>
>
> On Fri, Nov 20, 2009 at 7:44 AM, rosiere <[email protected]> wrote:
>
>>
>> Hello,
>>
>> Thanks for your explanation.
>> In fact the HTML layout that I try to parse is stable and hardly subjected
>> to future change, that's why I need to parse it.
>>
>> Now that I'm not goot at regex, I will use JMeter just to get the HTML
>> response from an https-based web site, and to store parsing results in
>> java
>> objects like ArrayList.
>>
>> So I created some Http request samplers, then attached a BeanShell
>> PostProcessor to it.
>> In the BeanShell script, I wrote some logic with dom w3c and jtidy API,
>> and
>> now I can see the extracted cell contents by System.err.println() in my
>> BeanShell.
>>
>> After that I had difficulties about JMeter variables usage. In my
>> BeanShell
>> script I created ArrayList objects and stored extracted texts in them, and
>> put them into JMeter context:
>>                vars.put("responseList", responseList);
>>                vars.put("responseDateList", responseDateList);
>> http://old.nabble.com/file/p26443545/BeanShellPostProcessor.gif
>>
>> After having parsed my HTML response, I would need a ForEach Controller to
>> iterate on these List objects' elements (which are just an array of values
>> in selected <td> elements), and to issue JDBC request to store them in
>> database (or any other possible operations to send them out of JMeter).
>> http://old.nabble.com/file/p26443545/ForEachController.gif
>>
>> However I was unable to get a ForEach Controller operate on objects in
>> vars.
>>
>> What did I miss and what should I do to iterate on vars' content and run a
>> sampler on each value in the iteration?
>>
>> With my best wishes,
>>
>> Rosière
>>
>>
>> Deepak Shetty wrote:
>> >
>> > Hi
>> > the regex you are using doesnt seem correct
>> > [^tr]
>> >  is any character that is not 't' or not 'r' it doesnt mean not the
>> > sequence
>> > tr.
>> >
>> > Plus if you are getting multiple <tr> instead of 1 that you expect your
>> > regex is probably too greedy try replacing .* constructs with .*? or
>> > modify
>> > the regex
>> >
>> > In any case XPath is as dependent on HTML structure as a Regex is (e.g.
>> > what
>> > if you move to a tableless layout)
>> >
>> >
>> > regards
>> > deepak
>> >
>> > On Thu, Nov 19, 2009 at 8:17 AM, rosiere <[email protected]>
>> wrote:
>> >
>> >>
>> >> Hello,
>> >>
>> >> Thanks for your advice.
>> >>
>> >> I did applied case insensitive check: like this:
>> >>
>> >> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>> >>
>> >> However I still face problem. Now I capture all <tr> elements in a same
>> >> group instead of each <tr> element.
>> >>
>> >> I read in my jmeter.log these informations about matching:
>> >>
>> >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex =
>> >> (?is)<tr\sclass="tgDataLine.*1\)\" >([^tr].*)</tr>
>> >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> >> RegexExtractor:
>> >> Match found!
>> >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> >> RegexExtractor:
>> >> Template piece #0 = 1
>> >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor:
>> >> RegexExtractor:
>> >> Template piece #1 =
>> >> 2009/11/19 17:03:33 DEBUG - jmeter.extractor.RegexExtractor: Regex
>> >> Extractor
>> >> result =
>> >> <TD>....<TD>
>> >> <TR>...</TR>
>> >> ...
>> >> <TR>....</TR>
>> >> <TD>
>> >>
>> >>
>> >> As for alternatives, I did want to parse a HTML with org.w3c.dom api,
>> but
>> >> dom methods like getElementsByTagName() are all case sensitive and may
>> >> not
>> >> be able to parse an HTML with both uppercase and lowercase tags.
>> >>
>> >> Besides, whenever the HTML page changes, I will have to rewrite my Java
>> >> code
>> >> based on dom api. So in order to minimize these unwanted effects on my
>> >> Java
>> >> code, I would still like to use regex, so that, whenever HTML structure
>> >> changes, I need only change the regex in JMeter but not my java code
>> that
>> >> cosumes the extracted HTML portions.
>> >>
>> >>
>> >>
>> >> Deepak Shetty wrote:
>> >> >
>> >> > You should probably make the check case insensitive. but I agree with
>> >> sebb
>> >> > ,
>> >> > parsing html constructs with regex is a pain and breaks quite
>> >> frequently
>> >> > regards
>> >> > deepak
>> >> >
>> >> > On Wed, Nov 18, 2009 at 10:37 AM, Andre Arnold <[email protected]>
>> >> wrote:
>> >> >
>> >> >> sebb schrieb:
>> >> >> > On 18/11/2009, rosiere <[email protected]> wrote:
>> >> >> >
>> >> >> >>  Hello,
>> >> >> >>
>> >> >> >>  I found that JMeter's oro regex is somehow different from
>> java's.
>> >> >> >>
>> >> >> >
>> >> >> > Yes.
>> >> >> >
>> >> >> > But not all that different; and neither is particularly well
>> suited
>> >> to
>> >> >> > this task.
>> >> >> >
>> >> >> > The XPath Extractor will probably be much easier to use.
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#XPath_Extractor
>> >> >> >
>> >> >> > This was discussed on the mailing list earlier this year.
>> >> >> >
>> >> >> >
>> >> >> >>  Now I need to iterate on different <tr> that matches a pattern,
>> >> then:
>> >> >> >>   capture all the <td> elements within each <tr> , and select the
>> >> 8th
>> >> >> and 9th
>> >> >> >>  <td>.
>> >> >> >>
>> >> >> >>  Since many <tr> elements appears in the HTML response, in order
>> to
>> >> do
>> >> >> this I
>> >> >> >>  have to capture <tr> line by line without including two lines in
>> a
>> >> >> same
>> >> >> >>  group:
>> >> >> >>
>> >> >> >>  so I should avoid capturing  continuous <tr>..</tr><tr>..</tr>
>> >> into
>> >> >> the
>> >> >> same
>> >> >> >>  group.
>> >> >> >>
>> >> >> >>  By writing (?is)<tr\sclass="tgDataLine.*1\)\" >(.*)</tr> I will
>> >> >> capture
>> >> >> only
>> >> >> >>  one group that contains many real <tr> elements
>> >> >> >>  So what should I write in the regex?
>> >> >> >>
>> >> >> >>
>> >> >> If you still need a pattern to match your needs.
>> >> >> I found that the following matches your the number you wanted and
>> the
>> >> >> following column value.
>> >> >>
>> >> >> reference: ref
>> >> >> pattern:     (?s)<TR.+?<TD.+?>([1-9|0]+?)</TD.+?<TD.+?>(.+?)</TD>
>> >> >> template:  $1$$2$
>> >> >> match :     1
>> >> >>
>> >> >> In ref_g1 you'll find the number.
>> >> >> In ref_g2 you'll find the following column value.
>> >> >>
>> >> >> To catch all the matches you need to increment a counter for the
>> match
>> >> >> and check wether there is another one or not.
>> >> >>
>> >> >> Your Testplan should look sth like this:
>> >> >>
>> >> >> -while controller (${__javaScript("${ref}"!="error")}  )
>> >> >> --counter (from 1 with increment 1 for the regex match value)
>> >> >> --Http Sampler (to get your site)
>> >> >> ---RegEx Extractor (as shown above)
>> >> >> --if controller( same as while controller--> ${ref}"!="error" )
>> >> >> ---your jdbc action (use ref_g1 & ref_g2)
>> >> >>
>> >> >>
>> >> >> Hope I got your problem right.
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> For additional commands, e-mail:
>> [email protected]
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26421379.html
>> >> Sent from the JMeter - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/How-can-I-extract-cell-data-%28content-surrounded-by-%3Ctd%3E%3C-td%3E%29-from-a-%3Ctable%3E-in-HTML-response--tp26371440p26443545.html
>> Sent from the JMeter - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Reply via email to