hi jordi, I couldn't download the attachment you added to the bug. can you send it to me directly and i'll try to get to it next month after the holidays.
thanks. peter --- Jordi Salvat i Alabart <[EMAIL PROTECTED]> wrote: > No, it doesn't. JTidy works well. > > I'm suspecting your guess is wrong... :-) > > -- > Salut, > > Jordi. > > En/na peter lin ha escrit: > > can you verify if the old JTidy implementation > > contains the same bug? > > > > I'm going to guess it's how I'm using htmlparser. > > > > peter > > > > > > --- Jordi Salvat i Alabart <[EMAIL PROTECTED]> > wrote: > > > >>Responding to myself again... > >> > >>I've been running some more tests with JVM > arguments > >>that I believe more > >>sensible, namely: > >> > >>-Xms256m -Xmx256m -XX:NewSize=64m > -XX:MaxNewSize=64m > >> > >>-XX:MaxLiveObjectEvacuationRatio=40 > >>-XX:SurvivorRatio=8 > >> > >>With this, the performance difference has almost > >>disappeared: I'm > >>getting ca. 12 sample/second with the htmlparser, > 15 > >>sample/second with > >>the regexp approach. The htmlparser solution > >>generates about 5 times > >>more garbage than the regexp solution -- which > >>explains why the results > >>were so tremendously different using -Xincgc. > >> > >>In this situation, I don't believe it's worth > >>providing users with the > >>ability to choose which parser they want. I won't > >>remove them now, but I > >>believe HtmlParser is the best choice,... once > we'll > >>have managed to > >>clean the outstanding bugs. > >> > >>The bugs I mentioned before (failure to parse a > >>couple of image URLs) > >>still hold. I'll file them now. > >> > >>-- > >>Salut, > >> > >>Jordi. > >> > >>En/na Jordi Salvat i Alabart ha escrit: > >> > >>>Hi. > >>> > >>>I've finally found some time to test the > >> > >>performance of the > >> > >>>HTTPSamplerFull implementation currently in CVS > >> > >>(developped by Peter Lin > >> > >>>using HTMLParser) against the implementation I > >> > >>sent a while ago to the > >> > >>>list (developped by me using Regexps). [Remember: > >> > >>the objective is not > >> > >>>to decide which is best, but whether it's worth > >> > >>having both available to > >> > >>>script developers]. > >>> > >>>The results are not conclusive, but they prove > >> > >>that the issue deserves > >> > >>>further analysis: > >>> > >>>1/ On the example I've been using, the > >> > >>Regexp-based implementation was > >> > >>>more accurate than the HTMLParser-based one. This > >> > >>is very surprising to > >> > >>>me, since I expected the Regexp-based > >> > >>implementation to be generally > >> > >>>less accurate. I'll need some help on this one. > >> > >>More details later. > >> > >>>2/ On the example I've been using, the > >> > >>Regexp-based implementation was > >> > >>>at least 7 times faster than the HTTPParser-based > >> > >>one. A quick look at > >> > >>>the code suggests that the HTML Parser is being > >> > >>called 5 times (one for > >> > >>>each tag of interest: img, applet, input, body, > >> > >>table). Am I correct? > >> > >>>The regexp-based implementation only scans > through > >> > >>the HTML once. This > >> > >>>could well explain most of the performance > >> > >>difference. Is there any way > >> > >>>to recode the HTMLParser-based implementation to > >> > >>do the job in a single > >> > >>>scan? > >>> > >>>How to reproduce the test: > >>>- Get Apache and JMeter running (I'm running both > >> > >>on the same box, which > >> > >>>is probably a bad idea). > >>>- Uncompress the attached > test-httpsamplerfull.tgz > >> > >>in the Apache > >> > >>>docroot. It contains a Yahoo home page saved > using > >> > >>Mozilla 1.5. (A > >> > >>>proper test would use several other samples). > >>>- Run the attached script and look at the Rate in > >> > >>the Aggregate Report. > >> > >>>On my IBM T30 with Pentium 4 M @ 2.2 GHz, 1 GB > >> > >>RAM, with JDK 1.4.2_02, > >> > >>>no fiddling with the java arguments (yes, that > >> > >>means I'm using -Xincgc, > >> > >>>which is probably the worst possible choice) I'm > >> > >>getting around 1 > >> > >>>sample/second with the HTPMLParser-based sampler > >> > >>and around 7 > >> > >>>sample/second with the Regexp-based one. > >>> > >>>In addition, the HTMLParser-based implementation > >> > >>is failing to download > >> > >>>two images: powrdbyhp_blu_84x28_yahoo.gif (it is > >> > >>downloading the HTML > >> > >>>page again instead) and 031121_l300.gif (it > >> > >>downloads nothing). I've > >> > >>>used Mozilla's "Live HTTP Headers" to see what > >> > >>Mozilla does and it > >> > >>>matches what the Regexp-based implementation is > >> > >>doing. I'd say there's a > >> > === message truncated === __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]