[ 
https://issues.apache.org/jira/browse/TS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232196#comment-13232196
 ] 

Zhao Yongming commented on TS-1114:
-----------------------------------

when we tracking down this issue, we have two directions: 
Weijin is tracking on why the event is "8", where there should not be any event 
that is "8" in the event system, and in other core dumps we are sure that the 
event is not what it should be as a really event, it is shown as a random data, 
that turns out to be something really interest: 1, it should be that the old 
data(may  or may not be the same event) is freed, and the event is not 
canceled. 2, someone overwrite the data in this event. Weijin track down this 
way and it turns out that the action cancel codes may rise some problem under 
certain situation. He made a patch into our tree, and we applied it on half of 
our servers, it runs without any crash for weeks.

At the same time, Koutai is working on make the vector write & read more safe, 
even in some very strange situation. And patched half of our servers, runs 
without any crash too.

after carefully discuss, we conclude that Weijing's patch is what we need to 
keep, and here comes the patch.

back to TS-857, when I look it back, there is some strange event in the back 
trace, we have only , is that the same issue hare? where is the action canceled 
without mutex protected? if we can consider TS-1114 a good fix, then we should 
think about TS-857 a crash same as it.

so far, I am not sure how many crashes after patched with TS-1114, I just don't 
get too much new back trace for this issue, TS-1114 may covered many strange 
crashes as it will make system really strange.
                
> Crash report: HttpTransactCache::SelectFromAlternates
> -----------------------------------------------------
>
>                 Key: TS-1114
>                 URL: https://issues.apache.org/jira/browse/TS-1114
>             Project: Traffic Server
>          Issue Type: Bug
>            Reporter: Zhao Yongming
>            Assignee: weijin
>             Fix For: 3.1.4
>
>         Attachments: cache_crash.diff
>
>
> it may or may not be the upstream issue, let us open it for tracking.
> {code}
> #0  0x000000000053075e in HttpTransactCache::SelectFromAlternates 
> (cache_vector=0x2aaab80ff500, client_request=0x2aaab80ff4c0, 
>     http_config_params=0x2aaab547b800) at ../../proxy/hdrs/HTTP.h:1375
> 1375    ((int32_t *) & val)[0] = m_alt->m_object_key[0];
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to