[ 
https://issues.apache.org/jira/browse/TS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830551#comment-13830551
 ] 

Leif Hedstrom edited comment on TS-2351 at 11/23/13 4:20 AM:
-------------------------------------------------------------

So, Punkley confirmed that the issue is in fact that we enter into this state 
with no cache_info.object_read. I'm still not sure how that happens, but unless 
amc can find something better, maybe we can just improve the situation for now 
with

{code}
diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc
index 049e672..d6b7605 100644
--- a/proxy/http/HttpTransact.cc
+++ b/proxy/http/HttpTransact.cc
@@ -8685,7 +8685,7 @@ 
HttpTransact::change_response_header_because_of_range_request(State *s, HTTPHdr

     header->field_attach(field);
     header->set_content_length(s->range_output_cl);
-  } else {
+  } else if (s->cache_info.object_read) {
     char numbers[RANGE_NUMBERS_LENGTH];
     header->field_delete(MIME_FIELD_CONTENT_RANGE, MIME_LEN_CONTENT_RANGE);
     field = header->field_create(MIME_FIELD_CONTENT_RANGE, 
MIME_LEN_CONTENT_RANGE);
{code}

This would allow us to set the CL: header properly for these requests. I'm 
*guessing* from talking with Punkley that there's some sort of race here, with 
a large number of clients sending Range: requests on a cache miss, and suddenly 
someone triggers a full response such that the cache is started to be written, 
making us trigger the transform. However, for some reason, the 
cache_info.object_read is not set for one request once in a while, so we fail 
big time. I've tried to reproduce it under these conditions.

The patch above obviously only masks the real underlying problem (the race 
condition), but we also need to get v4.1.2 out. The other option is to back out 
TS-1955 entirely, but that would always (guaranteed) send the wrong CL: header 
for all responses being filled from a cache object being filled and 
read-while-writer is enabled.

amc: thoughts?


was (Author: zwoop):
So, Punkley confirmed that the issue is in fact that we enter into this state 
with no cache_info.object_read. I'm still not sure how that happens, but unless 
amc can find something better, maybe we can just improve the situation for now 
with

{code}
diff --git a/proxy/http/HttpTransact.cc b/proxy/http/HttpTransact.cc
index 049e672..d6b7605 100644
--- a/proxy/http/HttpTransact.cc
+++ b/proxy/http/HttpTransact.cc
@@ -8685,7 +8685,7 @@ 
HttpTransact::change_response_header_because_of_range_request(State *s, HTTPHdr

     header->field_attach(field);
     header->set_content_length(s->range_output_cl);
-  } else {
+  } else if (s->cache_info.object_read) {
     char numbers[RANGE_NUMBERS_LENGTH];
     header->field_delete(MIME_FIELD_CONTENT_RANGE, MIME_LEN_CONTENT_RANGE);
     field = header->field_create(MIME_FIELD_CONTENT_RANGE, 
MIME_LEN_CONTENT_RANGE);
{code}

This would allow us to set the CL: header properly for these requests. I'm 
*guessing* from talking with Punkley that there's some sort of race here, with 
a large number of clients sending Range: requests on a cache miss, and suddenly 
someone triggers a full response such that the cache is started to be written, 
making us trigger the transform. However, for some reason, the 
cache_info.object_read is not set for one request once in a while, so we fail 
big time. I've tried to reproduce it under these conditions.

I think the patch above only masks the real underlying problem (the race 
condition), but we also need to get v4.1.2 out. The other option is to back out 
TS-1955 entirely, but that would always (guaranteed) send the wrong CL: header 
for all responses being filled from a cache object being filled and 
read-while-writer is enabled.

amc: thoughts?

> Range request crash in 4.1.x
> ----------------------------
>
>                 Key: TS-2351
>                 URL: https://issues.apache.org/jira/browse/TS-2351
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 4.1.1
>            Reporter: David Carlin
>            Assignee: Leif Hedstrom
>            Priority: Blocker
>             Fix For: 4.1.2
>
>
> I am seeing the following crash using the 4.1.0 and 4.1.1 artifacts posted on 
> the mailing list.
> One host I upgraded crashed immediately.  Others take hours/days for crash to 
> appear.
> {noformat}
> #0  0x0000000000544e59 in 
> HttpTransact::change_response_header_because_of_range_request 
> (s=0x2b2ff3f44288, header=0x2b2ff3f44968) at HttpTransact.cc:8692
> #1  0x0000000000545190 in HttpTransact::handle_content_length_header 
> (s=0x2b2ff3f44288, header=0x2b2ff3f44968, base=<value optimized out>)
>     at HttpTransact.cc:6559
> #2  0x000000000054c963 in HttpTransact::build_response (s=0x2b2ff3f44288, 
> base_response=0x2b2ff3f44a28, outgoing_response=0x2b2ff3f44968,
>     outgoing_version=<value optimized out>, status_code=HTTP_STATUS_NONE, 
> reason_phrase=0x6bf28c "None") at HttpTransact.cc:7682
> #3  0x000000000054d2e0 in build_response (s=0x2b2ff3f44288) at 
> HttpTransact.cc:7644
> #4  HttpTransact::handle_transform_ready (s=0x2b2ff3f44288) at 
> HttpTransact.cc:4577
> #5  0x000000000051bfe8 in HttpSM::call_transact_and_set_next_state 
> (this=0x2b2ff3f44220, f=<value optimized out>) at HttpSM.cc:6767
> #6  0x00000000005317ed in HttpSM::state_response_wait_for_transform_read 
> (this=0x2b2ff3f44220, event=2000, data=0x2b3168082460) at HttpSM.cc:1210
> #7  0x00000000005306e8 in HttpSM::main_handler (this=0x2b2ff3f44220, 
> event=2000, data=0x2b3168082460) at HttpSM.cc:2530
> #8  0x00000000004e9406 in handleEvent (this=0x2b31680823d8, event=1) at 
> ../iocore/eventsystem/I_Continuation.h:146
> #9  TransformTerminus::handle_event (this=0x2b31680823d8, event=1) at 
> Transform.cc:173
> #10 0x00000000006a636f in handleEvent (this=0x2b2fda6cc010, e=0x2b315c060c60, 
> calling_code=1) at I_Continuation.h:146
> #11 EThread::process_event (this=0x2b2fda6cc010, e=0x2b315c060c60, 
> calling_code=1) at UnixEThread.cc:145
> #12 0x00000000006a6eeb in EThread::execute (this=0x2b2fda6cc010) at 
> UnixEThread.cc:196
> #13 0x00000000004c6ae4 in main (argv=<value optimized out>) at Main.cc:1686
> NOTE: Traffic Server received Sig 11: Segmentation fault
> /usr/localbin/traffic_server - STACK TRACE:
> /lib64/libpthread.so.0(+0x3aec80f500)[0x2af3d029a500]
> /usr/localbin/traffic_server(_ZN12HttpTransact47change_response_header_because_of_range_requestEPNS_5StateEP7HTTPHdr+0x219)[0x544e59]
> /usr/localbin/traffic_server(_ZN12HttpTransact28handle_content_length_headerEPNS_5StateEP7HTTPHdrS3_+0x280)[0x545190]
> /usr/localbin/traffic_server(_ZN12HttpTransact14build_responseEPNS_5StateEP7HTTPHdrS3_11HTTPVersion10HTTPStatusPKc+0x3e3)[0x54c963]
> /usr/localbin/traffic_server(_ZN12HttpTransact22handle_transform_readyEPNS_5StateE+0x70)[0x54d2e0]
> /usr/localbin/traffic_server(_ZN6HttpSM32call_transact_and_set_next_stateEPFvPN12HttpTransact5StateEE+0x28)[0x51bfe8]
> /usr/localbin/traffic_server(_ZN6HttpSM38state_response_wait_for_transform_readEiPv+0xed)[0x5317ed]
> /usr/localbin/traffic_server(_ZN6HttpSM12main_handlerEiPv+0xd8)[0x5306e8]
> /usr/localbin/traffic_server(_ZN17TransformTerminus12handle_eventEiPv+0x1d6)[0x4e9406]
> /usr/localbin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x8f)[0x6a636f]
> /usr/localbin/traffic_server(_ZN7EThread7executeEv+0x63b)[0x6a6eeb]
> /usr/localbin/traffic_server[0x6a520a]
> /lib64/libpthread.so.0(+0x3aec807851)[0x2af3d0292851]
> /lib64/libc.so.6(clone+0x6d)[0x3aec4e890d]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to