Here's the trace:

#0  0x00002aaaaae61737 in hwloc__xml_export_object (output=0x7fffffffd890, 
topology=0x695f10, obj=0x2aaaab139b28)
    at topology-xml.c:1094
#1  0x00002aaaaae61b69 in hwloc___nolibxml_prepare_export (topology=0x695f10, 
    xmlbuffer=0x698a70 "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE 
topology SYSTEM \"hwloc.dtd\">\n<topology>\n  <object type=\"Unknown\" 
os_level=\"-1424778408\" os_index=\"10922\" cpuset=\"0xf...f\" 
complete_cpuset=\"0xf...f\" onl"..., 
    buflen=16384) at topology-xml.c:1193
#2  0x00002aaaaae61be0 in hwloc__nolibxml_prepare_export (topology=0x695f10, 
bufferp=0x7fffffffd988, buflenp=0x7fffffffd97c)
    at topology-xml.c:1207
#3  0x00002aaaaae61d02 in opal_hwloc122_hwloc_topology_export_xmlbuffer 
(topology=0x695f10, xmlbuffer=0x7fffffffd988, 
    buflen=0x7fffffffd97c) at topology-xml.c:1281
#4  0x00002aaaaae529f4 in opal_hwloc_compare (topo1=0x695f10, topo2=0x6915c0, 
type=22 '\026') at base/hwloc_base_dt.c:183
#5  0x00002aaaaadf348c in opal_dss_compare (value1=0x695f10, value2=0x6915c0, 
type=22 '\026') at dss/dss_compare.c:39
#6  0x00002aaaaad9b5f7 in process_orted_launch_report (fd=-1, event=1, 
data=0x6444d0) at base/plm_base_launch_support.c:564
#7  0x00002aaaaae3881f in event_process_active_single_queue (base=0x60dd60, 
activeq=0x6111e0) at event.c:1329
#8  0x00002aaaaae38c71 in event_process_active (base=0x60dd60) at event.c:1396
#9  0x00002aaaaae3902b in opal_libevent2012_event_base_loop (base=0x60dd60, 
flags=1) at event.c:1598
#10 0x00002aaaaadf080d in opal_progress () at runtime/opal_progress.c:189
#11 0x00002aaaaad9bbfa in orte_plm_base_daemon_callback (num_daemons=2) at 
base/plm_base_launch_support.c:666
#12 0x00002aaaaada49e1 in plm_slurm_launch_job (jdata=0x67a500) at 
plm_slurm_module.c:404
#13 0x0000000000403822 in orterun (argc=4, argv=0x7fffffffe1d8) at orterun.c:817
#14 0x0000000000402aa3 in main (argc=4, argv=0x7fffffffe1d8) at main.c:13

And the error report

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaae61737 in hwloc__xml_export_object (output=0x7fffffffd890, 
topology=0x695f10, obj=0x2aaaab139b28)
    at topology-xml.c:1094
1094        sprintf(tmp, "%llu", (unsigned long long) 
obj->memory.page_types[i].count);
(gdb) print obj
$1 = (opal_hwloc122_hwloc_obj_t) 0x2aaaab139b28
(gdb) print *obj
$2 = {type = 2870188824, os_index = 10922, name = 0x2aaaab139b18 
"\b\233\023\253\252*", memory = {total_memory = 6579376, 
    local_memory = 6579376, page_types_len = 2870188856, page_types = 
0x2aaaab139b38}, attr = 0x2aaaab139b48, 
  depth = 2870188872, logical_index = 10922, os_level = -1424778408, 
next_cousin = 0x2aaaab139b58, 
  prev_cousin = 0x2aaaab139b68, parent = 0x2aaaab139b68, sibling_rank = 
2870188920, next_sibling = 0x2aaaab139b78, 
  prev_sibling = 0x2aaaab139b88, arity = 2870188936, children = 0x2aaaab139b98, 
first_child = 0x2aaaab139b98, 
  last_child = 0x2aaaab139ba8, userdata = 0x2aaaab139ba8, cpuset = 
0x2aaaab139bb8, complete_cpuset = 0x2aaaab139bb8, 
  online_cpuset = 0x2aaaab139bc8, allowed_cpuset = 0x2aaaab139bc8, nodeset = 
0x2aaaab139bd8, 
  complete_nodeset = 0x2aaaab139bd8, allowed_nodeset = 0x2aaaab139be8, 
distances = 0x2aaaab139be8, 
  distances_count = 2870189048, infos = 0x2aaaab139bf8, infos_count = 
2870189064}
(gdb) print obj->memory
$3 = {total_memory = 6579376, local_memory = 6579376, page_types_len = 
2870188856, page_types = 0x2aaaab139b38}
(gdb) print obj->memory.page_types
$4 = (struct opal_hwloc122_hwloc_obj_memory_page_type_s *) 0x2aaaab139b38
(gdb) print i
$5 = 1612
(gdb) print obj->memory.page_types[1600]
$6 = {size = 0, count = 0}
(gdb) print obj->memory.page_types[1612]
Cannot access memory at address 0x2aaaab13fff8
(gdb) print obj->memory.page_types[1611]
$7 = {size = 0, count = 0}
(gdb) 


The whole obj looks like trash to me. I looked a little more - the object 
referenced is the root object:

1193      hwloc__xml_export_object (&output, topology, 
hwloc_get_root_obj(topology));

I'm continuing to look in case I'm doing something stupid, but the code is 
pretty linear here - unpack, import, export for compare.


On Sep 24, 2011, at 8:59 AM, Jeff Squyres wrote:

> Here's some feedback from Ralph -- any idea what's going wrong here?
> 
> -----
> 
> 1. I export a topology into xml using
> 
>       hwloc_topology_export_xmlbuffer(t, &xmlbuffer, &len);
> 
> I then pack and send the string.
> 
> 2. I unpack the string on the other end and import it into a topology
>       hwloc_topology_init(&t);
>       if (0 != (rc = hwloc_topology_set_xmlbuffer(t, xmlbuffer, 
> strlen(xmlbuffer)))) {
>           hwloc_topology_destroy(t);
>           goto cleanup;
>       }
>       hwloc_topology_load(t);
> 
> 3. I then need to compare two topologies, so I export the topology I received 
> into another xml string
>   hwloc_topology_export_xmlbuffer(t1, &x1, &l1);
> 
> It is this export that fails, which implies to me that somehow the import 
> didn't work right. Note that this code worked fine with libxml2, so this is a 
> regression.
> 
> 
> On Sep 22, 2011, at 9:39 AM, Jeff Squyres wrote:
> 
>> Yes, I can get some testing of the ompi branch pretty quickly.  I can bring 
>> in a new copy of this later today and see what we can see.
>> 
>> Many thanks!
>> 
>> 
>> On Sep 19, 2011, at 9:05 AM, Brice Goglin wrote:
>> 
>>> I pushed the new minimalistic XML import/export implementation without
>>> libxml2 to the nolibxml branch. If libxml2 is available, it's still used
>>> by default. --disable-libxml2 or some env variables can be used for
>>> force the minimalistic implementation if needed. The minimalistic implem
>>> is only guaranteed to import XML files that were generated by hwloc
>>> (even if libxml was enabled there).
>>> 
>>> I also backported most of this to the new v1.2-ompi branch (required to
>>> backport some other XML cleanups from trunk). This branch will now serve
>>> as a base for Open MPI's embedded hwloc. The idea is to have a complete
>>> v1.2 + nolibxml somewhere so that we can at least run make check (Open
>>> MPI does not embed enough to run hwloc's make check).
>>> 
>>> How do we proceed now? Can we have the OMPI guys test the new code soon?
>>> Should I wait for their feedback before merging the nolibxml branch into
>>> the trunk? I'd like to merge this in v1.3 too (and basically release rc2
>>> as the actual first feature-complete RC), so getting feedback early
>>> might be appreciated.
>>> 
>>> Brice
>>> 
>>> _______________________________________________
>>> hwloc-devel mailing list
>>> hwloc-de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> hwloc-devel mailing list
>> hwloc-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


Reply via email to