what is it that you want to select? all the columns? that are not titles would be something like //tbody/tr/td/span (but this will flatten out the structure)?
regards deepak On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglist...@gmail.com>wrote: > Felix, > > I'll have look at the xpath. it looks interesting. But I can't find any > example of code for xpath? > Thank you > Thanh > > ps: this is the table I'm working on. 1st row is the title. 2nd row > contains > data. I want to extract data1, data2....the regular expression reads row by > row. In the beanshell I do 2 loop: for each row and for each column. There > are rows number odd and rows number even. > > > <table> > <tr><th class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img > alt="" height="5" src="/assets/common/img/cnr_t_tl.gif" width="5"></th><th > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_name')" onclick="submitForm1023(event);return > false;" title="Sort by column Title">Title1</a></span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title2</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_deliveryType')" > onclick="submitForm1024(event);return false;" title="Sort by column > Delivery > Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt="" > border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_regStartDate')" > onclick="submitForm1025(event);return false;" title="Sort by column > Registration Date">Title4</a></span></th><td > class="sbListColumnSpacer"><img > alt="" border="0" height="1" src="/assets/common/img/1x1.gif" > width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_completionStatus')" > onclick="submitForm1026(event);return false;" title="Sort by column > Completion Status">Title5</a></span></th><td > class="sbListColumnSpacer"><img > alt="" border="0" height="1" src="/assets/common/img/1x1.gif" > width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_completionDate')" > onclick="submitForm1027(event);return false;" title="Sort by column Date > Marked Complete">Title6</a></span></th><td class="sbListColumnSpacer"><img > alt="" border="0" height="1" src="/assets/common/img/1x1.gif" > width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title7</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_score')" onclick="submitForm1028(event);return > false;" title="Sort by column Score">Title8</a></span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_grade')" onclick="submitForm1029(event);return > false;" title="Sort by column Grade">Title9</a></span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title10</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title11</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title12</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title13</span></th><td > class="sbListColumnSpacer"><img alt="" border="0" height="1" > src="/assets/common/img/1x1.gif" width="1"></td><th > class="sbListHeaderCell" > nowrap="true" scope="col"><img alt="" height="1" > src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText"><a class="sbListHeaderText" > href="javascript:void('sort_startDate')" > onclick="submitForm1030(event);return false;" title="Sort by column > Offering > Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img > alt="" border="0" height="1" src="/assets/common/img/1x1.gif" > width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span > class="sbListHeaderText">Title15</span></th><th align="right" > class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img alt="" > height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr> > > <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span > class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false" > href="javascript:void('titleLink')" onclick="submitForm1031(event);return > false;" title="data1">data1</a></span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText"> </span></td><td > class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText">data2</span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText">data3</span></td><td class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span > class="sbListText">data4</span><br><a class="sbLinkTableDisplay" > doTruncate="false" href="javascript:void('blah')" > onclick="submitForm1033(event);return false;" title="blah > blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText">data5</span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText"> </span></td><td > class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText"> </span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText"> </span></td><td > class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText">data6</span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText">data7</span></td><td class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText">data8</span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span > class="sbListText">data8</span></td><td class="sbListColumnSpacer"></td><td > class="sbListOddCell"><span class="sbListText"> </span></td><td > class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a > class="sbLinkTableDisplay" doTruncate="false" > href="javascript:void('editLink')" onclick="submitForm1035(event);return > false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay" > doTruncate="false" href="javascript:void('deleteLink')" > onclick="submitForm1036(event);return false;" > title="Delete">Delete</a><br><br></td><td > class="sbListOddCellEnd"></td></tr><tr> > > </table> > > > > On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <f...@mpexnet.de> wrote: > > > On 01/24/2011 04:27 PM, thanh nguyen wrote: > > > Hi everyone, > > > > > > I have a big HTML table from which I need to extract data. The table > has > > > several columns. The regulation expression required to do the > extraction > > job > > > is very long and complex. The code is hard to debug and to maintain. > I'd > > > like to know what are the alternatives? Is there HTML parser that > create > > DOM > > > objects? I could program a postprocessor in beanshell... > > > > > > Thanks a lot > > > > That would be the XPath Extractor, but maybe someone can help you build > > a simpler regex instead (you need to share more details for this to > > happen). > > > > Regards, > > Felix > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: jmeter-user-unsubscr...@jakarta.apache.org > > For additional commands, e-mail: jmeter-user-h...@jakarta.apache.org > > > > >