Hi,

I can’t reproduce the bug. I even installed a FreeBSD :-) I add Willy in
copy, maybe he will reproduce it.

Thierry


> On 28 Apr 2018, at 01:36, PiBa-NL <[email protected]> wrote:
> 
> Hi Thierry,
> 
> Op 27-4-2018 om 1:54 schreef PiBa-NL:
>> Hi Thierry,
>> 
>> Op 26-4-2018 om 12:25 schreef [email protected]:
>>> Your trace shows a corrupted tree. Maybe it is due to the freebsd
>>> architecture and the corruption is no reproductible on Linux ? I do not have
>>> freebsd for testing.
>>> 
>>> Regards,
>>> Thierry
>> 
>> My 'best' reproduction scenario involves around 50 to 100 request or 
>> sometimes less requests total made from my Chrome browser. (over a vpn 
>> connection which adds a little latency.. a ping/reply to the haproxy server 
>> takes +-17ms , maybe that helps reproduce.. :/ )
>> I've changed the lua function 'print_r' to not write anything in its wr() 
>> function, to rule out the the freebsd console / ssh session to be causing 
>> the issue. It stays the same though.
>> 
>> Also been adding some printf statements to how the eb_tree is used, which i 
>> think might be interesting..
>>    printf("eb32sc_insert(%d)\n",new);
>>    printf("eb32sc_next(%d)  leaf_p: %d \n",eb32, eb32->node.leaf_p);
>>    printf("eb32sc_delete(%d)\n",eb32);
>> 
>> The pattern i see here is that usually a task is inserted, the next task is 
>> looked up, and the task gets deleted again.
>> Last round before the crash i see that the task is inserted, then deleted 
>> and then afterwards the next task is being retrieved for that same 'item'.. 
>> Which fails.. Perhaps because it was the last task to run.?. And there is 
>> nolonger a higher root to jump higher up the tree.?. I must admit i don't 
>> fully grasp how the tree is traversed exactly.. I seem to see that at least 
>> for the first task to be put into the tree there is some special handling.. 
>> On the other side, if it aint in the tree, is there technically still a 
>> 'next' item??
>> 
>> Below the last part of that logging, and also attached the complete log from 
>> the start.. Perhaps it gives a clue.?.
>> 
>> Regards,
>> 
>> PiBa-NL (Pieter)
>> 
>> eb32sc_insert(35826016)
>> process_runnable_tasks() active_tasks_mask &= ~tid_bit
>> eb32sc_next(35826016)  leaf_p: 8606272
>> eb32sc_delete(35826016)
>> task_wakeup  35826016  32
>> active_tasks_mask |= t->thread_mask  (35826016)
>> eb32sc_insert(35826016)
>> task_wakeup  35826656  8
>> active_tasks_mask |= t->thread_mask  (35826656)
>> eb32sc_insert(35826656)
>> process_runnable_tasks() active_tasks_mask &= ~tid_bit
>> eb32sc_next(35826656)  leaf_p: 35826656
>> eb32sc_delete(35826656)
>> 00000013:TestSite.srvrep[0006:ffffffff]: HTTP/1.1 200 OK
>> 00000013:TestSite.srvhdr[0006:ffffffff]: Refresh: 1
>> 00000013:TestSite.srvhdr[0006:ffffffff]: Server: haproxy/webstats
>> 00000013:TestSite.srvhdr[0006:ffffffff]: Content-Type: text/html
>> 00000013:TestSite.srvhdr[0006:ffffffff]: Content-Length: 1778
>> eb32sc_delete(35826016)
>> 00000013:TestSite.srvcls[0006:ffffffff]
>> eb32sc_next(35826016)  leaf_p: 0
>> Segmentation fault (core dumped)
>> 
> Tried to dig a little further.. pretty sure this are the steps to the issue. 
> The exact reproduction probably is rather timing sensitive though.
> A tcp connection 'stream task' gets closed, before the applet was done 
> running and this removes the applet task from the tree. This applet task is 
> however what the 'rq_next' is pointing to..
> 
> Below stack seems to be what removes and frees the applet task.. Not exactly 
> sure if this is the exact one causing the crash a little later its not 
> related to below 'log' anyhow..
> 
> hlua_applet_http_release (ctx=0x802238780) at P:\Git\haproxy\src\hlua.c:6668
> si_applet_release (si=0x8022accf0) at 
> P:\Git\haproxy\include\proto\stream_interface.h:234
> stream_int_shutw_applet (si=0x8022accf0) at 
> P:\Git\haproxy\src\stream_interface.c:1506
> si_shutw (si=0x8022accf0) at 
> P:\Git\haproxy\include\proto\stream_interface.h:321
> process_stream (t=0x80222a8c0) at P:\Git\haproxy\src\stream.c:2161
> process_runnable_tasks () at P:\Git\haproxy\src\task.c:236
> run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2404
> run_thread_poll_loop (data=0x8022420a0) at P:\Git\haproxy\src\haproxy.c:2469
> main (argc=5, argv=0x7fffffffea68) at P:\Git\haproxy\src\haproxy.c:3060
> 
> 
>    *    Below some printf output, mostly containing function names, and a few 
> extra inside the process_runnable_tasks loop to show what was the current and 
> next task...
> active_tasks_mask |= t->thread_mask  (35827936)
> eb32sc_insert(35827936)
> process_runnable_tasks() active_tasks_mask &= ~tid_bit
> find rq_next, current task 35827936  rq_current:35827936
> eb32sc_next(35827936)  leaf_p: 8610368
> eb32sc_delete(35827936)
> process task 35827936  rq_next:0
> task_wakeup  35827936  32
> active_tasks_mask |= t->thread_mask  (35827936)
> eb32sc_insert(35827936)
> process_runnable_tasks() active_tasks_mask &= ~tid_bit
> find rq_next, current task 35827936  rq_current:35827936
> eb32sc_next(35827936)  leaf_p: 8610368
> eb32sc_delete(35827936)
> process task 35827936  rq_next:0
> task_wakeup  35827936  32
> active_tasks_mask |= t->thread_mask  (35827936)
> eb32sc_insert(35827936)
> task_wakeup  35827616  8
> active_tasks_mask |= t->thread_mask  (35827616)
> eb32sc_insert(35827616)
> process_runnable_tasks() active_tasks_mask &= ~tid_bit
> find rq_next, current task 35827616  rq_current:35827616
> eb32sc_next(35827616)  leaf_p: 35827616
> eb32sc_delete(35827616)
>    *    Here processing of task ..16 starts and rq_next is filled with ..36..
> process task 35827616  rq_next:35827936
> 0000002b:TestSite.srvrep[0014:ffffffff]: HTTP/1.1 200 OK
> 0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Type: text/html
> 0000002b:TestSite.srvhdr[0014:ffffffff]: Server: haproxy/webstats
> 0000002b:TestSite.srvhdr[0014:ffffffff]: Refresh: 1
> 0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Length: 1779
>    *    there is a part that says some states may not happen for applets but 
> the other state SI_ST_EST above falls through to that part maybe thats 
> okay.?. maybe not.
> stream_int_shutw_applet, may not happen si->state:7 si->flags:16820 
> ic->flags:-2080370622
> si_applet_release
> hlua_applet_http_release
>    *    While ..16 is running that ..36 gets deleted and freed.
> task_delete (task:35827936)
> eb32sc_delete(35827936)
> 0000002b:TestSite.srvcls[0014:ffffffff]
> find rq_next, current task 35827936  rq_current:35827936
>   *     Next loop the 'next' of ..36 is tried to be read, ..36 itself is 
> however not part of the tree anymore..
> eb32sc_next(35827936)  leaf_p: 0
>   *     Coredump..
> 
> Perhaps the task should not be deleted and freed directly but flagged 'to be 
> destroyed' and then later in the process_runnable_tasks loop the actual 
> remove&free should be done.?
> I hope this enough info to maybe try and patch it even without a 
> reproduction.? Or maybe can try and reproduce it by 'forcing' some creative 
> delays here and there to get the 'correct' order of events.?.
> 
> Regards,
> 
> PiBa-NL (Pieter)


Reply via email to