> On Sep 11, 2017, at 7:13 PM, Dima Pasechnik <dimpase+...@gmail.com> wrote: > >> On Mon, Sep 4, 2017 at 11:15 AM, Daniel Kochmański <dan...@turtleware.eu> >> wrote: >> From the backtrace it is sure that fail is caused inside the call to >> GC_init. Such errors are known to have happened when another GC was >> initialized already on the system (I've linked the issue). It might be >> caused by something else in bdwgc, I don't know. Either way I'd focus on >> GC_init part. > > Our project (sagemath) only uses libgc within the embedded ECL. Thus I > am really puzzled how another libgc instance might kick in and spoil > the game for ECL. > > One possibility is that clang is using libgc, and thus, in principle, > libgc might be sitting somewhere in the runtime?! > > >> >> To make sure, that I'm right with my assertion you may put printf before and >> after call to GC_init. I'm not quite familiar with bdwgc internals to say, >> what is wrong though. Maybe updating bundled sources of GC will help? Or >> linking with libgc on the system? It might be that it was a bug in bdwgc >> which got already fixed. > > We are not using the bdwgc shipped with ECL, we use a separate libgc > 7.6.0, which is the latest stable. > (Is there a reason to ship bdwgc sources with ECL - do you patch it?) >
I'm using ecl with the non embedded bdwgc as well and I don't have issue. Ensure that bdwgc it's not also build statically in ecl as well. I expect linking problems in that case but worth it double check. > Thanks, > Dima > >> >> Regards, >> >> Daniel >> >> >> >>> On 04.09.2017 12:04, Dima Pasechnik wrote: >>> >>> On Fri, Sep 1, 2017 at 1:57 PM, Daniel Kochmański <dan...@turtleware.eu> >>> wrote: >>>> >>>> I dont think its related to shared vs static - rather two gc running >>>> concurrently. Try commenting out GC_init call in ecl and see what >>>> happens. >>> >>> I don't understand how two GCs can run concurrently on a memory region >>> controlled by ECL which is statically linked to GC... >>> In fact I am pretty sure no other instances of GC are running anywhere >>> within our process tree. >>> >>> By the way, I don't know whether it's obvious from the backtrace that >>> cl_boot() has been completed, or not. >>> >>> If it actually was completed, could it be a bug that invalidates the >>> bit indicating that cl_boot() has been done? >>> >>> We have seen similar troubles with clang recently, related to FPE. >>> There an FPE bit was flipped by assignment of a double to an >>> integer type (sic!). >>> It took us a lot of head banging on various hard surfaces to debug this: >>> https://trac.sagemath.org/ticket/22799 >>> it turned out we did hit a known bug: >>> https://bugs.llvm.org//show_bug.cgi?id=17686 >>> >>>> Do you need sigchld for anything? Run-program was rewritten and sigchld >>>> handling wasnt viable option anymore for it. >>>> >>> We do set ECL_OPT_TRAP_SIGCHLD to 0, thus I presume we >>> now can simply skip it all together. >>> >>> Thanks, >>> Dima >>> >>>> Im on phone, will be avail after the weekend. >>>> >>>> Regards, D. >>>> >>>> >>>> Dnia 1 września 2017 14:47:57 CEST, Dima Pasechnik >>>> <dimpase+...@gmail.com> >>>> napisał(a): >>>>> >>>>> Hi Daniel, >>>>> Thanks for the message. The scenario you talk about only happens if GC >>>>> is a shared library, right? >>>>> >>>>> I've rebuilt GC disabling shared libs, and ECL doing static linking to >>>>> GC. >>>>> And I still get very similar segfaults: >>>>> >>>>> ;;; ECL C Backtrace >>>>> ;;; 0 ecl_internal_error (0x87d79b375) >>>>> ;;; 1 init_unixint (0x87d7c17e0) >>>>> ;;; 2 init_unixint (0x87d7c1582) >>>>> ;;; 3 pthread_sigmask (0x80103779d) >>>>> ;;; 4 pthread_getspecific (0x801036d6f) >>>>> ;;; 5 unknown (0x7ffffffff193) >>>>> ;;; 6 GC_push_current_stack (0x87d7ef7c3) >>>>> ;;; 7 GC_with_callee_saves_pushed (0x87d7f7360) >>>>> ;;; 8 GC_push_roots (0x87d7ef9c2) >>>>> ;;; 9 GC_mark_some (0x87d7ec97c) >>>>> ;;; 10 GC_stopped_mark (0x87d7e6b7a) >>>>> ;;; 11 GC_try_to_collect_inner (0x87d7e6a75) >>>>> ;;; 12 GC_init (0x87d7f08ea) >>>>> ;;; 13 init_alloc (0x87d7d5669) >>>>> ;;; 14 cl_boot (0x87d69f66b) >>>>> ... >>>>> >>>>> And a very similar picture on the develop branch of ECL - although >>>>> I had to change our code, as in particular >>>>> ECL_OPT_TRAP_SIGCHLD is gone... >>>>> >>>>> So, what can it be? Some signals issue? >>>>> >>>>> Thanks, >>>>> Dima >>>>> >>>>> On Fri, Sep 1, 2017 at 7:38 AM, Daniel Kochmański <dan...@turtleware.eu> >>>>> wrote: >>>>>> >>>>>> Hey Dima, >>>>>> >>>>>> this looks like the issue with having GC initialized before ECL kicks >>>>>> in. >>>>>> See https://gitlab.com/embeddable-common-lisp/ecl/issues/371 for a >>>>>> discussion about this problem. Basically some other component already >>>>>> called >>>>>> GC_init and ECL calls it once more. It's arguably not a bug. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Daniel >>>>>> >>>>>> >>>>>>> On 31.08.2017 15:29, Dima Pasechnik wrote: >>>>>>> >>>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I'm struggling to understand strange segfaults coming from >>>>>>> ECL(+Maxima) on FreeBSD embedded into Python; they typically look as >>>>>>> follows: >>>>>>> >>>>>>> Got signal before environment was installed on our thread >>>>>>> [2: No such file or directory] >>>>>>> >>>>>>> ;;; ECL C Backtrace >>>>>>> ;;; 0 ecl_internal_error (0x87d790765) >>>>>>> ;;; 1 init_unixint (0x87d7b6bd0) >>>>>>> ;;; 2 init_unixint (0x87d7b6972) >>>>>>> ;;; 3 pthread_sigmask (0x80103779d) >>>>>>> ;;; 4 pthread_getspecific (0x801036d6f) >>>>>>> ;;; 5 unknown (0x7ffffffff193) >>>>>>> ;;; 6 GC_push_all_stacks (0x87db1ea2c) >>>>>>> ;;; 7 GC_mark_some (0x87db12eec) >>>>>>> ;;; 8 GC_stopped_mark (0x87db09baa) >>>>>>> ;;; 9 GC_try_to_collect_inner (0x87db09a75) >>>>>>> ;;; 10 GC_init (0x87db16f4f) >>>>>>> ;;; 11 init_alloc (0x87d7caa59) >>>>>>> ;;; 12 cl_boot (0x87d694a5b) >>>>>>> ;;; 13 initecl (0x87d218340) >>>>>>> ;;; 14 initecl (0x87d20a43f) >>>>>>> ;;; 15 initecl (0x87d207e28) >>>>>>> ;;; 16 _PyImport_LoadDynamicModule (0x800b3ed1c) >>>>>>> ;;; 17 PyImport_AppendInittab (0x800b3d71f) >>>>>>> ;;; 18 PyImport_AppendInittab (0x800b3d1a8) >>>>>>> ;;; 19 PyImport_ImportModuleLevel (0x800b3c2ce) >>>>>>> ;;; 20 _PyBuiltin_Init (0x800b162d7) >>>>>>> ;;; 21 PyObject_Call (0x800a7d3e3) >>>>>>> ;;; 22 PyEval_EvalFrameEx (0x800b2121c) >>>>>>> ;;; 23 PyEval_EvalCodeEx (0x800b1b5d4) >>>>>>> ;;; 24 PyEval_EvalCode (0x800b1ad96) >>>>>>> ;;; 25 PyImport_ExecCodeModuleEx (0x800b3ad11) >>>>>>> ;;; 26 PyImport_AppendInittab (0x800b3ddb8) >>>>>>> ;;; 27 PyImport_AppendInittab (0x800b3d71f) >>>>>>> ;;; 28 PyImport_AppendInittab (0x800b3d1a8) >>>>>>> ;;; 29 PyImport_ImportModuleLevel (0x800b3c2ce) >>>>>>> ;;; 30 _PyBuiltin_Init (0x800b162d7) >>>>>>> ;;; 31 PyEval_EvalFrameEx (0x800b22dd1) >>>>>>> Segmentation fault (core dumped) >>>>>>> >>>>>>> It looks as if ECL (version 16.1.2) is being called before an >>>>>>> initialisation is complete, but it it possible to say more without a >>>>>>> debugger? >>>>>>> >>>>>>> More details: is is on FreeBSD 11.0, clang 3.8.0, GC version 7.6.0 >>>>>>> with libatomic_ops version 7.4.6. >>>>>>> And only reproducible on FreeBSD. >>>>>>> >>>>>>> ECL is built with --disable-threads; GC is built with or without >>>>>>> threads---result is still the same. >>>>>>> (so it's unclear to me where pthread_* calls in the trace >>>>>>> come from). >>>>>>> >>>>>>> Thanks, >>>>>>> Dima >>>>>>> >>>>>>> PS. the segfault is at the bottom of >>>>>>> https://trac.sagemath.org/ticket/22679#comment:87 >>>>>> >>>>>> >>>>>> >>>> -- Wysłane za pomocą K-9 Mail. >> >> > .