what is it that you want to select? all the columns? that are not titles
would be something like
//tbody/tr/td/span (but this will flatten out the structure)?

regards
deepak

On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglist...@gmail.com>wrote:

> Felix,
>
> I'll have look at the xpath. it looks interesting. But I can't find any
> example of code for xpath?
> Thank you
> Thanh
>
> ps: this is the table I'm working on. 1st row is the title. 2nd row
> contains
> data. I want to extract data1, data2....the regular expression reads row by
> row. In the beanshell I do 2 loop: for each row and for each column. There
> are rows number odd and rows number even.
>
>
> <table>
> <tr><th class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
> alt="" height="5" src="/assets/common/img/cnr_t_tl.gif" width="5"></th><th
> class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_name')" onclick="submitForm1023(event);return
> false;" title="Sort by column Title">Title1</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title2</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_deliveryType')"
> onclick="submitForm1024(event);return false;" title="Sort by column
> Delivery
> Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
> border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_regStartDate')"
> onclick="submitForm1025(event);return false;" title="Sort by column
> Registration Date">Title4</a></span></th><td
> class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_completionStatus')"
> onclick="submitForm1026(event);return false;" title="Sort by column
> Completion Status">Title5</a></span></th><td
> class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_completionDate')"
> onclick="submitForm1027(event);return false;" title="Sort by column Date
> Marked Complete">Title6</a></span></th><td class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title7</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_score')" onclick="submitForm1028(event);return
> false;" title="Sort by column Score">Title8</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_grade')" onclick="submitForm1029(event);return
> false;" title="Sort by column Grade">Title9</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title10</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title11</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title12</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title13</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_startDate')"
> onclick="submitForm1030(event);return false;" title="Sort by column
> Offering
> Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title15</span></th><th align="right"
> class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img alt=""
> height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
>
> <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
> class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
> href="javascript:void('titleLink')" onclick="submitForm1031(event);return
> false;" title="data1">data1</a></span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data2</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data3</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
> class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
> doTruncate="false" href="javascript:void('blah')"
> onclick="submitForm1033(event);return false;" title="blah
> blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data5</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data6</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data7</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data8</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data8</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
> class="sbLinkTableDisplay" doTruncate="false"
> href="javascript:void('editLink')" onclick="submitForm1035(event);return
> false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
> doTruncate="false" href="javascript:void('deleteLink')"
> onclick="submitForm1036(event);return false;"
> title="Delete">Delete</a><br><br></td><td
> class="sbListOddCellEnd"></td></tr><tr>
>
> </table>
>
>
>
> On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <f...@mpexnet.de> wrote:
>
> > On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > > Hi everyone,
> > >
> > > I have a big HTML table from which I need to extract data. The table
> has
> > > several columns. The regulation expression required to do the
> extraction
> > job
> > > is very long and complex. The code is hard to debug and to maintain.
> I'd
> > > like to know what are the alternatives? Is there HTML parser that
> create
> > DOM
> > > objects? I could program a postprocessor in beanshell...
> > >
> > > Thanks a lot
> >
> > That would be the XPath Extractor, but maybe someone can help you build
> > a simpler regex instead (you need to share more details for this to
> > happen).
> >
> > Regards,
> > Felix
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: jmeter-user-unsubscr...@jakarta.apache.org
> > For additional commands, e-mail: jmeter-user-h...@jakarta.apache.org
> >
> >
>

Reply via email to