I was not thinking about using regexps instead of a decent HTML parser, but if they were really faster, it could well be worth having both methods available. It would need to be _really_ faster to be worth the hassle, but from experience I know it could well be (although you also gave reasons to think it won't be).

You're right that HTML is dirty and the regexps will be difficult, but I'm familiar with the issue and already have some previously used in Perl scripts... for example, get all image URIs by:

(?si)<IMG(?=\s)[^\>]*?\sSRC\s*=\s*"([^">]*)"

Others are more difficult -- for example stylesheets:

m{(?si)<LINK(?=\s)(?:[^\>]*?\s(?:HREF\s*=\s*"([^">]*)"|REL\s*=\s*"stylesheet")){2,}}g

I'll give it a shot so that we can compare -- it's important, because I've seen that processing responses is one of JMeter's biggest CPU hogs. We will probably be able to use the results for extractors, too.

--
Salut,

Jordi.

peter lin wrote:

I'm not convinced a regexp approach would be better than HtmlParser for a couple of reasons.

- HtmlParser already works on the stream directly
using readers.

- java regexp is decent, but not blazing fast like
perl regexp.

- to make it easy to extend, regexp isn't ideal.

- html is dirty, so a developer would need sufficient
expertise with regexp to get it to work correctly.

- I'm not a regexp guru, but if some one else is
willing to try to write a generalize package for
scanning specific tags that can handle dirty html it
would be great.

- I'd rather write a Html compiler reading the bytes
directly than use regexp.

- HtmlParser is sufficiently fast and efficient that I
think it is a good candidate to replace tidy. Plus I
don't like having to build DOM just to get the images.

I'm open to ideas. If no one objects, I will continue
as planned and complete the new sampler using
HtmlParser.

peter


--- Jordi Salvat i Alabart <[EMAIL PROTECTED]> wrote:


My experience with -Xincgc is that it never helps:
the overhead it adds is so huge that the shorter GC pauses never
compensate for it.


Have you thought about a regexp-based
implementation? It would be less correct, but probably good enough, and possibly much
faster.


--
Salut,

Jordi.

peter lin wrote:

I ran some benchmarks today with a new version of

httpsamplerfull using HtmlParser. the results are interesting. Perhaps the biggest and most interesting discovery for me is the dramatic difference in performance between with and without -Xincgc.


http://tao.altern8.net:8080/comparison_summary.pdf


the results are in pdf format.

when I run JMeter with incremental GC, HtmlParser

version beats Tidy easily, but without incremental GC, the performance gain is marginal as the number of threads increase.


it would appear incremental GC hinders DOM and

Tidy performance and results in a steady increase in heap size. Without incremental GC, the response time with HtmlParser is generally faster than with Tidy by 5-10%. Under which circumstances is using -Xincgc better for JMeter?


the jdk I am using is 1.4.1 on windows.


peter




--------------------------------- Do you Yahoo!? The New Yahoo! Shopping - with improved product

search





---------------------------------------------------------------------


To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]




__________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to