On Jan 10, 2014, at 12:45 PM, Adrian Reber <adr...@lisas.de> wrote: > On Fri, Jan 10, 2014 at 09:48:14AM -0800, Ralph Castain wrote: >> >> On Jan 10, 2014, at 8:02 AM, Adrian Reber <adr...@lisas.de> wrote: >> >>> I am currently trying to understand how callbacks are working. Right now >>> I am looking at orte/mca/rml/base/rml_base_receive.c >>> orte_rml_base_comm_start() which does >>> >>> orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, >>> ORTE_RML_TAG_RML_INFO_UPDATE, >>> ORTE_RML_PERSISTENT, >>> orte_rml_base_recv, >>> NULL); >>> >>> As far as I understand it orte_rml_base_recv() is the callback function. >>> At which point should this function run? When the data is actually >>> received? >> >> Not precisely. When data is received by the OOB, it pushes the data into an >> event. When that event gets serviced, it calls the orte_rml_base_receive >> function which processes the data to find the matching tag, and then uses >> that to execute the callback to the user code. >> >>> >>> The same for send_buffer_nb() functions. I do not see the callback >>> functions actually running. How can I verify that the callback functions >>> are running. Especially for the send case it sounds pretty obvious how >>> it should work but I never see the callback function running. At least >>> in my setup. >> >> The data is not immediately sent. It gets pushed into an event. When that >> event gets serviced, it calls the orte_oob_base_send function which then >> passes the data to each active OOB component until one of them says it can >> send it. The data is then pushed into another event to get it into the event >> base for that component's active module - when that event gets serviced, the >> data is sent. Once the data is sent, an event is created that, when >> serviced, executes the callback to the user code. >> >> If you aren't seeing callbacks, the most likely cause is that the orte >> progress thread isn't running. Without it, none of this will work. > > Thanks. Running configure without '--with-ft=cr' I can run a program and > use orte-top. In orterun I can see that the callback is running and > orte-top displays the retrieved information. I can also see in orte-top > that the callbacks are working.
Actually, I'm rather impressed - I hadn't tested orte-top and didn't honestly know if it would work any more! Glad to hear it does :-) > Doing the same with '--with-ft=cr' > enabled orte-top crashes as well as orte-checkpoint and both (-top and > -checkpoint) seem to no longer have working callbacks and that is why > they are probably crashing. So some code which is enabled by '--with-ft=cr' > seems to break callbacks in orte-top as well as in orte-checkpoint. > orterun handles callbacks no matter if configured with or without > '--with-ft=cr'. I can take a look this weekend - probably something silly > > Adrian > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel