[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

Ricky Chan (JIRA) Thu, 10 Mar 2011 04:01:25 -0800

    [ 
https://issues.apache.org/jira/browse/TS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005040#comment-13005040
 ]


Ricky Chan commented on TS-489:
-------------------------------

For me both of these are not ideal (I actually have read-while-writer already 
enabled) which does what it says but can still flood origins if you ask for 
simultaneous requests for the same object that is un-cached. (because object is 
not know retrieved by any of the connections yet).

I understand your point though, so I'll look at way of re-organise the cache 
hierarchy to mitigate origin hits until better features become available.  

I currently have tweaked the logic to almost never refresh content at all 
because under load (a few thousand hits for same cache object object) it hits 
the origin way to much.

I haven't found a happy medium in the fuzziness (which isn't documented)where 
by it won't hit origins on sudden surges on a cached object and allow 
refreshing on quite periods too.  Also it doesn't help if the object has never 
been seen before.




> Seg Fault with Connection_Collapsing and clustering enabled.
> ------------------------------------------------------------
>
>                 Key: TS-489
>                 URL: https://issues.apache.org/jira/browse/TS-489
>             Project: Traffic Server
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: Debian Lenny.
> 2.6.26-2-amd-64
> Sun Blade X6240 (2 x Six-Core AMD Opteron(tm) Processor 2439 SE)
> 64G Memory
>            Reporter: Ricky Chan
>            Assignee: mohan_zl
>            Priority: Critical
>             Fix For: 2.1.6
>
>         Attachments: TS-489-zym-1.txt, TS-489.patch, code_clean_up.patch, 
> collapse1.trace, collapse2.trace, ts_489_testing.txt
>
>
> Bug is easily reproduced, with the following setup.
> Traffic Server 2.0.0
> Enable Clustering (so you'll need two machine and make sure cluster is 
> actually working) (LOCAL proxy.local.cluster.type INT 1)
> Enable Connection Collapsing (CONFIG 
> proxy.config.connection_collapsing.hashtable_enabled INT 1)
> Other changes to records.config which may or may affect it are changes to 
> heuristics:
> CONFIG proxy.config.http.cache.heuristic_min_lifetime INT 5
> CONFIG proxy.config.http.cache.heuristic_max_lifetime INT 86400
> CONFIG proxy.config.http.cache.heuristic_lm_factor FLOAT 0.000100
> CONFIG proxy.config.http.cache.fuzz.time INT 240
> CONFIG proxy.config.http.cache.fuzz.probability FLOAT 0.000005
> Using a 3rd machine using apache benchmark (ab)  and request with say -n 
> 1000000 with  keep alive (-k) and -c 8000 say.  I found it happens all the 
> time above 8000.  I just fetched a file from origin on lighttpd which had a 
> cache-control header of max-age 86400, so to reduce hitting origin.  Size of 
> file is 9 bytes only.
> Note: You need to set ulimit  -n very high and set sysctl ip_local_port_range 
> to larger than defaults to be able to run test, I did ulimit -n 1000000 and 
> had sysctl -w net.ipv4.ip_local_port_range="1024 65000" to be able to run AB.
> Disabling clustering or connection Collapsing the program no longer.
> I then added GDB wrapper around traffic_server and it clearly shows it's the 
> connection collapsing API which is at fault here.
> I'll add these traces as attachments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (TS-489) Seg Fault with Connection_Collapsing and clustering enabled.

Reply via email to