Hi Thomas, Sounds like it could be this issue:
http://www.gnu.org/software/freeipmi/freeipmi-faq.html#Why-am-I-seeing-so-many-_0027internal-IPMI-error_0027-messages_003f While I haven't seen segfaults yet, it's very believable it could happen. Al On Mon, 2013-12-02 at 18:16 +0100, Thomas Cadeau wrote: > Hi all, > > I come back with the same result on "clean" nodes. > > Using a simple program, I have answer in 3ms with active driver: > $ lsmod |grep ipmi > ipmi_devintf 8145 2 > ipmi_si 42497 1 > And an answer in 17 ms without the driver. > > Inside our project, we have no problem with the drivers, but without, we > always have the memory issue described beside. > Note it will be a real problem with new rhel kernel. > We really need to correct this. > > I will come back tomorrow to see witch part of our code we can share for > the moment. > The project is quite big but the only difference with the simple program > I see is the call inside the thread. > > Thomas > > Le 22/11/2013 16:51, Thomas Cadeau a écrit : > > Thanks a lot for your answer. > > > > The way you propose will not fit to what we want to do. > > I re-ran on "safe" cpus without any troubles. > > > > When I will have a real pool of cpus without any other troubles, I > > will let you you if there is again the problem. > > > > Thomas > > > > Le 21/11/2013 20:23, Albert Chu a écrit : > >> Hi Thomas, > >> > >> I did a quick sanity test on my system and it worked (of course, it may > >> have not been exactly like you did things). > >> > >> The trace indicates the segfault is here: > >> > >>> #0 0x00007f4e278c89a9 in inb (ctx=0x7f4e28001770) at > >>>> /usr/include/sys/io.h:48 > >> Which is during memory mapped i/o. I suppose a segfault could happen if > >> the in/out call was going to a bad part of memory. It might suggest > >> some corruption is happening. Is it possible you're corrupting some > >> data structure somewhere? The close/destroy/re-create works b/c it > >> fixes the corruption? > >> > >> In all of FreeIPMI (especially the multi-ranged host access in the > >> tools), we create a context per thread for communication, e.g. > >> > >> launch_thread > >> ctx = ipmi_ctx_create(); > >> ipmi_ctx_find_inband(ctx, ...); > >> loop > >> ipmi_cmd_raw > >> > >> Have you considered doing it this way? > >> > >> Al > >> > >> > >> On Thu, 2013-11-21 at 17:00 +0100, Thomas Cadeau wrote: > >>> Hi all, > >>> > >>> > >>> I'am curently tring to call a raw command several times. > >>> Here are the functions I call: > >>> > >>>> ctx = ipmi_ctx_create() > >>>> > >>>> ipmi_ctx_find_inband (ctx, > >>>> NULL,//&driver_type, > >>>> 0, // disable_auto_probe, > >>>> 0, // driver_address, > >>>> 0, // register_spacing, > >>>> 0, // driver_device, > >>>> 0, // workaround_flags, > >>>> IPMI_FLAGS_DEFAULT//0 > >>>> ) > >>>> > >>>> ipmi_cmd_raw(ctx, > >>>> 0x00, //lun (logical unit number) > >>>> 0x3A,//IPMI_NET_FN_SENSOR_EVENT_RQ, > >>>> bytes_rq, //request data //const void * > >>>> 2, //length (in bytes) > >>>> bytes_rs, //response buffer //void * > >>>> IPMI_RAW_MAX_ARGS //max response length > >>>> ) > >>> I check all return code. > >>> > >>> If I create a simple example with a loop, I have no problem. > >>>> ctx = ipmi_ctx_create() > >>>> ipmi_ctx_find_inband ( ... ) > >>>> for (...){ > >>>> ipmi_cmd_raw(...) > >>>> //use result > >>>> } > >>> Then I try inside an internal project, during initialization, I use the > >>> 3 functions, and then each time I want to update and call > >>> ipmi_cmd_raw(...), a thread is created to do all operations. > >>> > >>>> ctx = ipmi_ctx_create() > >>>> ipmi_ctx_find_inband ( ... ) > >>>> ipmi_cmd_raw(...) > >>>> //use result > >>>> ... > >>>> //with fixed frequency: > >>>> launch thread > >>>> > ipmi_cmd_raw(...) > >>>> > //use result > >>> In this case, on some cpus, I have no problem. But on some, I have a > >>> segfault (core dump): > >>>> #0 0x00007f4e278c89a9 in inb (ctx=0x7f4e28001770) at > >>>> /usr/include/sys/io.h:48 > >>>> #1 _ipmi_kcs_get_status (ctx=0x7f4e28001770) at > >>>> driver/ipmi-kcs-driver.c:533 > >>>> #2 0x00007f4e278c8e50 in _ipmi_kcs_wait_for_ibf_clear > >>>> (ctx=0x7f4e28001770) > >>>> at driver/ipmi-kcs-driver.c:656 > >>>> #3 0x00007f4e278c91d6 in ipmi_kcs_write (ctx=0x7f4e28001770, > >>>> buf=0x7f4e28003420, buf_len=3) > >>>> at driver/ipmi-kcs-driver.c:845 > >>>> #4 0x00007f4e27898bc1 in _kcs_cmd_write (ctx=0x7f4e28005190, > >>>> obj_cmd_rq=<value optimized out>, > >>>> obj_cmd_rs=0x7f4e28001ae0) at api/ipmi-kcs-driver-api.c:255 > >>>> #5 api_kcs_cmd (ctx=0x7f4e28005190, obj_cmd_rq=<value optimized out>, > >>>> obj_cmd_rs=0x7f4e28001ae0) > >>>> at api/ipmi-kcs-driver-api.c:398 > >>>> #6 0x00007f4e27899091 in api_kcs_cmd_raw (ctx=0x7f4e28005190, > >>>> buf_rq=0x7f4e2e390a60, buf_rq_len=2, > >>>> buf_rs=0x7f4e2e38f8c0, buf_rs_len=4512) at > >>>> api/ipmi-kcs-driver-api.c:750 > >>>> #7 0x00007f4e2788f9a9 in ipmi_cmd_raw (ctx=0x7f4e28005190, lun=<value > >>>> optimized out>, > >>>> net_fn=<value optimized out>, buf_rq=0x7f4e2e390a60, > >>>> buf_rq_len=2, > >>>> buf_rs=0x7f4e2e38f8c0, > >>>> buf_rs_len=4512) at api/ipmi-api.c:1983 > >>> If I force to connect again, I have no problem. But this workaround is > >>> not a good way: > >>>> ctx = ipmi_ctx_create() > >>>> ipmi_ctx_find_inband ( ... ) > >>>> ipmi_cmd_raw(...) > >>>> //use result > >>>> ... > >>>> //with fixed frequency: > >>>> launch thread > >>>> > ipmi_ctx_close(ctx) > >>>> > ipmi_ctx_destroy(ctx); > >>>>> ctx = ipmi_ctx_create() > >>>>> ipmi_ctx_find_inband ( ... ) > >>>> >ipmi_cmd_raw(...) > >>>> > //use result > >>> Note that I check the version of BMC on each nodes, and I use > >>> freeipmi-1.2.1. > >>> I also hace security to ensure only one use of ctx can be done. > >>> > >>> Do you have any idea of what happpens and if I'm doing something wrong? > >>> Is there a function to check the connection is opened and if I need to > >>> reopen? > >>> > >>> Thank you for your help. > >>> > >>> Thomas Cadeau > >>> > >>> _______________________________________________ > >>> Freeipmi-devel mailing list > >>> [email protected] > >>> https://lists.gnu.org/mailman/listinfo/freeipmi-devel > > > -- Albert Chu [email protected] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/freeipmi-devel
