Seems to me that there are several aspects to whether this is worth it:

1) the cost of hashing the page and maintaining the cache (not
ignoring the memory requirements)
2) the cost of parsing the page
3) the likelihood of cache hits.

It's only worth the effort if test plans are likely to result in cache hits.

As has been pointed out, for CSS URLs the content is generally
constant, so the likelihood of cache hits depends on the likelihood of
encountering the same content.
This will generally be quite high for a specific site, thus caching
CSS can make sense.

However for HTML pages even the same URL often has different content
(timestamps, cookies etc).

One way to find out would be to measure this for some existing
real-world test plans.

This could be done with a simple Listener/Post-Processor that does the
hashing for each html page and logs the results. The hashes could be
extracted from the log and used to derive stats for the potential
cache hits.
(Or the listener could do the stats, but that would increase
complexity and resources. Or one could use the existing Save Responses
Listener and post-process the files, but that would require a lot more
storage.)

I don't think it's worth proceeding without some data that shows cache
hits are sufficiently frequent in practise.


On 11 August 2016 at 20:44, Philippe Mouawad <[email protected]> wrote:
> On Thu, Aug 11, 2016 at 9:36 PM, Vladimir Sitnikov <
> [email protected]> wrote:
>
>> Philippe Mouawad>
>>
>> > "certain"  in my sentence does not mean "certainty" :-) at least from
>> what
>> > I understand in english.
>> >
>>
>> Of course I mean "please provide some measurements of the parsing overhead"
>>  :-)
>>
>> Philippe Mouawad>
>>
>> > It more means "an impact of a certain degree".
>> > No numbers, more of reasoning that Parsing (based on Jodd or JSoup) comes
>> > at the cost of Regexp parsing, which I think has certainly :-) a cost
>> right
>> > ?
>> >
>>
>> Do you have some numbers to compare?
>>
>
> No, before starting any work on this I wanted to have some feedback.
> I don't want to spend too much time on potentially bad idea.
>
>
>
>> Of course HTML parsing is not free. The basic question is how much CPU does
>> it take, so we can analyze/compare/reproduce that.
>>
>>  Philippe Mouawad>
>>
>> > That was my doubt. But take an ecommerce website where part of users are
>> > navigating anonymously, don't you think an important part of the pages is
>> > similar ?
>> > - product page
>> > - home page
>> > - category page
>> > ...
>> >
>> I do not have such experience, so I cannot tell what would be the hit rate.
>>
>>
>> Philippe> Maybe user could indicate in a way when to optimize and when not
>> ?
>>
>> That reminds me
>> http://mrale.ph/blog/2015/01/11/whats-up-with-monomorphism.html
>> For instance: make each HTTP samplers store additional state.
>> The state is one of "unknown" (initial), "has duplicates" (that is when we
>> check cache first), "always unique" (avoid caching as sampler is known to
>> sending unique outputs).
>>
>> So the first several executions we estimate if the sampler is worth
>> caching, then we switch into "has duplicates" or "always unique" mode.
>>
>>
>> Philippe>Maybe user could indicate in a way when to optimize and when not ?
>>
>> The lesser the number of knobs the better the UX is. I would try some
>> automatic solution first, then semi-automatic, then fully manual.
>>
>>
>>
>> > > 4) What if we implement "fetch links only during the first sampler
>> > > execution"?
>> > >
>> >
>> > Can you give more details on your idea ?
>> >
>>
>> On the first sampler execution, do proper HTML parsing and collect the
>> external links. Then make a pokerface and just assume that this particular
>> test element would always return the same set of resources no matter what.
>> Of course it will not work for the cases like
>> url=${home_or_product_page_based_on_the_moons_phase}, but for certain
>> cases
>> where the sampler is dedicated to one particular type of page it might work
>> just fine.
>>
>>
>> Vladimir
>>
>
>
>
> --
> Cordialement.
> Philippe Mouawad.

Reply via email to