[ 
https://issues.apache.org/jira/browse/TS-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330416#comment-14330416
 ] 

Luca Bruno commented on TS-3395:
--------------------------------

For you maybe. Not sure how to explain better, we don't want to depend on some 
kind of magic tuning with some magic numbers. We want a software that achieves 
that hit ratio even when stressed, and must not depend on any kind of max 
connections.
You avoid cases, we instead want to be sure certain things never happen. If 
it's not ATS job please let me know so I can look further.

> Hit ratio drops with high concurrency
> -------------------------------------
>
>                 Key: TS-3395
>                 URL: https://issues.apache.org/jira/browse/TS-3395
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>            Reporter: Luca Bruno
>             Fix For: 5.3.0
>
>
> I'm doing some tests and I've noticed that the hit ratio drops with more than 
> 300 simultaneous http connections.
> The cache is on a raw disk of 500gb and it's not filled, so no eviction. The 
> ram cache is disabled.
> The test is done with web-polygraph. Content size vary from 5kb to 20kb 
> uniformly, expected hit ratio 60%, 2000 http connections, documents expire 
> after months. There's no Vary.
> !http://i.imgur.com/Zxlhgnf.png!
> Then I thought it could be a problem of polygraph. I wrote my own 
> client/server test code, it works fine also with squid, varnish and nginx. I 
> register a hit if I get either cR or cH in the headers.
> {noformat}
> 2015/02/19 12:38:28 Starting 1000000 requests
> 2015/02/19 12:37:58 Elapsed: 3m51.23552164s
> 2015/02/19 12:37:58 Total average: 231.235µs/req, 4324.60req/s
> 2015/02/19 12:37:58 Average size: 12.50kb/req
> 2015/02/19 12:37:58 Bytes read: 12498412.45kb, 54050.57kb/s
> 2015/02/19 12:37:58 Errors: 0
> 2015/02/19 12:37:58 Offered Hit ratio: 59.95%
> 2015/02/19 12:37:58 Measured Hit ratio: 37.20%
> 2015/02/19 12:37:58 Hit bytes: 4649000609
> 2015/02/19 12:37:58 Hit success: 599476/599476 (100.00%), 469.840902ms/req
> 2015/02/19 12:37:58 Miss success: 400524/400524 (100.00%), 336.301464ms/req
> {noformat}
> So similar results, 37.20% on average. Then I thought that could be a problem 
> of how I'm testing stuff, and tried with nginx cache. It achieves 60% hit 
> ratio, but request rate is very slow compared to ATS for obvious reasons.
> Then I wanted to check if with 200 connections but with longer test time hit 
> ratio also dropped, but no, it's fine:
> !http://i.imgur.com/oMHscuf.png!
> So not a problem of my tests I guess.
> Then I realized by debugging the test server that the same url was asked 
> twice.
> Out of 1000000 requests, 78600 urls were asked at least twice. An url was 
> even requested 9 times. These same url are not requested close to each other: 
> even more than 30sec can pass from one request to the other for the same url.
> I also tweaked the following parameters:
> {noformat}
> CONFIG proxy.config.http.cache.fuzz.time INT 0
> CONFIG proxy.config.http.cache.fuzz.min_time INT 0
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.000000
> CONFIG proxy.config.http.cache.max_open_read_retries INT 4
> CONFIG proxy.config.http.cache.open_read_retry_time INT 500
> {noformat}
> And this is the result with polygraph, similar results:
> !http://i.imgur.com/YgOndhY.png!
> Tweaked the read-while-writer option, and yet having similar results.
> Then I've enabled 1GB of ram, it is slightly better at the beginning, but 
> then it drops:
> !http://i.imgur.com/dFTJI16.png!
> traffic_top says 25% ram hit, 37% fresh, 63% cold.
> So given that it doesn't seem to be a concurrency problem when requesting the 
> url to the origin server, could it be a problem of concurrent write access to 
> the cache? So that some pages are not cached at all? The traffoc_top fresh 
> percentage also makes me think it can be a problem in writing the cache.
> Not sure if I explained the problem correctly, ask me further information in 
> case. But in summary: hit ratio drops with a high number of connections, and 
> the problem seems related to pages that are not written to the cache.
> This is some related issue: 
> http://mail-archives.apache.org/mod_mbox/trafficserver-users/201301.mbox/%3ccd28cb1f.1f44a%25peter.wa...@email.disney.com%3E
> Also this: 
> http://apache-traffic-server.24303.n7.nabble.com/why-my-proxy-node-cache-hit-ratio-drops-td928.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to