Hi, I can’t reproduce the bug. I even installed a FreeBSD :-) I add Willy in copy, maybe he will reproduce it.
Thierry > On 28 Apr 2018, at 01:36, PiBa-NL <[email protected]> wrote: > > Hi Thierry, > > Op 27-4-2018 om 1:54 schreef PiBa-NL: >> Hi Thierry, >> >> Op 26-4-2018 om 12:25 schreef [email protected]: >>> Your trace shows a corrupted tree. Maybe it is due to the freebsd >>> architecture and the corruption is no reproductible on Linux ? I do not have >>> freebsd for testing. >>> >>> Regards, >>> Thierry >> >> My 'best' reproduction scenario involves around 50 to 100 request or >> sometimes less requests total made from my Chrome browser. (over a vpn >> connection which adds a little latency.. a ping/reply to the haproxy server >> takes +-17ms , maybe that helps reproduce.. :/ ) >> I've changed the lua function 'print_r' to not write anything in its wr() >> function, to rule out the the freebsd console / ssh session to be causing >> the issue. It stays the same though. >> >> Also been adding some printf statements to how the eb_tree is used, which i >> think might be interesting.. >> printf("eb32sc_insert(%d)\n",new); >> printf("eb32sc_next(%d) leaf_p: %d \n",eb32, eb32->node.leaf_p); >> printf("eb32sc_delete(%d)\n",eb32); >> >> The pattern i see here is that usually a task is inserted, the next task is >> looked up, and the task gets deleted again. >> Last round before the crash i see that the task is inserted, then deleted >> and then afterwards the next task is being retrieved for that same 'item'.. >> Which fails.. Perhaps because it was the last task to run.?. And there is >> nolonger a higher root to jump higher up the tree.?. I must admit i don't >> fully grasp how the tree is traversed exactly.. I seem to see that at least >> for the first task to be put into the tree there is some special handling.. >> On the other side, if it aint in the tree, is there technically still a >> 'next' item?? >> >> Below the last part of that logging, and also attached the complete log from >> the start.. Perhaps it gives a clue.?. >> >> Regards, >> >> PiBa-NL (Pieter) >> >> eb32sc_insert(35826016) >> process_runnable_tasks() active_tasks_mask &= ~tid_bit >> eb32sc_next(35826016) leaf_p: 8606272 >> eb32sc_delete(35826016) >> task_wakeup 35826016 32 >> active_tasks_mask |= t->thread_mask (35826016) >> eb32sc_insert(35826016) >> task_wakeup 35826656 8 >> active_tasks_mask |= t->thread_mask (35826656) >> eb32sc_insert(35826656) >> process_runnable_tasks() active_tasks_mask &= ~tid_bit >> eb32sc_next(35826656) leaf_p: 35826656 >> eb32sc_delete(35826656) >> 00000013:TestSite.srvrep[0006:ffffffff]: HTTP/1.1 200 OK >> 00000013:TestSite.srvhdr[0006:ffffffff]: Refresh: 1 >> 00000013:TestSite.srvhdr[0006:ffffffff]: Server: haproxy/webstats >> 00000013:TestSite.srvhdr[0006:ffffffff]: Content-Type: text/html >> 00000013:TestSite.srvhdr[0006:ffffffff]: Content-Length: 1778 >> eb32sc_delete(35826016) >> 00000013:TestSite.srvcls[0006:ffffffff] >> eb32sc_next(35826016) leaf_p: 0 >> Segmentation fault (core dumped) >> > Tried to dig a little further.. pretty sure this are the steps to the issue. > The exact reproduction probably is rather timing sensitive though. > A tcp connection 'stream task' gets closed, before the applet was done > running and this removes the applet task from the tree. This applet task is > however what the 'rq_next' is pointing to.. > > Below stack seems to be what removes and frees the applet task.. Not exactly > sure if this is the exact one causing the crash a little later its not > related to below 'log' anyhow.. > > hlua_applet_http_release (ctx=0x802238780) at P:\Git\haproxy\src\hlua.c:6668 > si_applet_release (si=0x8022accf0) at > P:\Git\haproxy\include\proto\stream_interface.h:234 > stream_int_shutw_applet (si=0x8022accf0) at > P:\Git\haproxy\src\stream_interface.c:1506 > si_shutw (si=0x8022accf0) at > P:\Git\haproxy\include\proto\stream_interface.h:321 > process_stream (t=0x80222a8c0) at P:\Git\haproxy\src\stream.c:2161 > process_runnable_tasks () at P:\Git\haproxy\src\task.c:236 > run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2404 > run_thread_poll_loop (data=0x8022420a0) at P:\Git\haproxy\src\haproxy.c:2469 > main (argc=5, argv=0x7fffffffea68) at P:\Git\haproxy\src\haproxy.c:3060 > > > * Below some printf output, mostly containing function names, and a few > extra inside the process_runnable_tasks loop to show what was the current and > next task... > active_tasks_mask |= t->thread_mask (35827936) > eb32sc_insert(35827936) > process_runnable_tasks() active_tasks_mask &= ~tid_bit > find rq_next, current task 35827936 rq_current:35827936 > eb32sc_next(35827936) leaf_p: 8610368 > eb32sc_delete(35827936) > process task 35827936 rq_next:0 > task_wakeup 35827936 32 > active_tasks_mask |= t->thread_mask (35827936) > eb32sc_insert(35827936) > process_runnable_tasks() active_tasks_mask &= ~tid_bit > find rq_next, current task 35827936 rq_current:35827936 > eb32sc_next(35827936) leaf_p: 8610368 > eb32sc_delete(35827936) > process task 35827936 rq_next:0 > task_wakeup 35827936 32 > active_tasks_mask |= t->thread_mask (35827936) > eb32sc_insert(35827936) > task_wakeup 35827616 8 > active_tasks_mask |= t->thread_mask (35827616) > eb32sc_insert(35827616) > process_runnable_tasks() active_tasks_mask &= ~tid_bit > find rq_next, current task 35827616 rq_current:35827616 > eb32sc_next(35827616) leaf_p: 35827616 > eb32sc_delete(35827616) > * Here processing of task ..16 starts and rq_next is filled with ..36.. > process task 35827616 rq_next:35827936 > 0000002b:TestSite.srvrep[0014:ffffffff]: HTTP/1.1 200 OK > 0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Type: text/html > 0000002b:TestSite.srvhdr[0014:ffffffff]: Server: haproxy/webstats > 0000002b:TestSite.srvhdr[0014:ffffffff]: Refresh: 1 > 0000002b:TestSite.srvhdr[0014:ffffffff]: Content-Length: 1779 > * there is a part that says some states may not happen for applets but > the other state SI_ST_EST above falls through to that part maybe thats > okay.?. maybe not. > stream_int_shutw_applet, may not happen si->state:7 si->flags:16820 > ic->flags:-2080370622 > si_applet_release > hlua_applet_http_release > * While ..16 is running that ..36 gets deleted and freed. > task_delete (task:35827936) > eb32sc_delete(35827936) > 0000002b:TestSite.srvcls[0014:ffffffff] > find rq_next, current task 35827936 rq_current:35827936 > * Next loop the 'next' of ..36 is tried to be read, ..36 itself is > however not part of the tree anymore.. > eb32sc_next(35827936) leaf_p: 0 > * Coredump.. > > Perhaps the task should not be deleted and freed directly but flagged 'to be > destroyed' and then later in the process_runnable_tasks loop the actual > remove&free should be done.? > I hope this enough info to maybe try and patch it even without a > reproduction.? Or maybe can try and reproduce it by 'forcing' some creative > delays here and there to get the 'correct' order of events.?. > > Regards, > > PiBa-NL (Pieter)

