Crud - sorry for delayed response. I was out for a bit.

I’ll just change it to %d as there is nothing magic about it being unsigned. 
How bizarre.


> On Dec 12, 2014, at 3:21 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> NOTE:
> 
> The existing code for "%l." in guess_strlen() is garbage.
> The va_arg() macro calls all have "int" for the type!!
> 
> I am *only* testing a fix for the missing "%u" at the moment.
> 
> -Paul
> 
> On Fri, Dec 12, 2014 at 3:14 PM, Paul Hargrove <phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>> wrote:
> Thanks, Gilles!
> 
> I was looking at that same code just now and completely missed the lack of a 
> case for '%u' (and '%lu').  I will add one now and see if that resolves the 
> problem....
> 
> 
> -Paul
> 
> On Fri, Dec 12, 2014 at 3:10 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:
> Ralph,
> 
> I cannot find a case for the %u format is guess_strlen
> And since the default does not invoke va_arg()
> I
> it seems strlen is invoked on nnuma instead of arch
> 
> Makes sense ?
> 
> Cheers,
> 
> Gilles
> 
> Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
> Afraid I’m drawing a blank, Paul - I can’t see how we got to a bad address 
> down there. This is at the beginning of orte_init, so there are no threads 
> running nor has anything much happened.
> 
> Do you have any suggestions?
> 
> 
>> On Dec 12, 2014, at 9:02 AM, Paul Hargrove <phhargr...@lbl.gov 
>> <mailto:phhargr...@lbl.gov>> wrote:
>> 
>> Ralph,
>> 
>> The "arch" variable looks fine:
>> Current function is opal_hwloc_base_get_topo_signature
>>  2134                    nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt, arch);
>> (dbx) print arch
>> arch = 0x1001700a0 "sun4v"
>> 
>> And so is "fmt":
>> 
>> Current function is opal_asprintf
>>   194       length = opal_vasprintf(ptr, fmt, ap);
>> (dbx) print fmt
>> fmt = 0xffffffff7eeada98 "%uN:%uS:%uL3:%uL2:%uL1:%uC:%uH:%s"
>> 
>> However, things have gone bad in guess_strlen():
>> 
>> Current function is guess_strlen
>>    71                       len += (int)strlen(sarg);
>> (dbx) print sarg
>> sarg = 0x2 "<bad address 0x2>"
>> 
>> -Paul
>> 
>> On Fri, Dec 12, 2014 at 2:24 AM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> Hmmm….this is really odd. I actually do have a protection for that arch 
>> value being NULL, and you are in the code section when it isn’t.
>> 
>> Do you still have the core file around? If so, can you print out the value 
>> of the “arch” variable? It would be in the 
>> opal_hwloc_base_get_topo_signature level.
>> 
>> I’m wondering if that value has been hosed, and the problem is memory 
>> corruption somewhere.
>> 
>> 
>>> On Dec 11, 2014, at 8:56 PM, Ralph Castain <r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org>> wrote:
>>> 
>>> Thanks Paul - I will post a fix for this tomorrow. Looks like Sparc isn’t 
>>> returning an architecture type for some reason, and I didn’t protect 
>>> against it.
>>> 
>>> 
>>>> On Dec 11, 2014, at 7:39 PM, Paul Hargrove <phhargr...@lbl.gov 
>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>> 
>>>> Backtrace for the Solaris-10/SPARC SEGV appears below.
>>>> I've changed the subject line to distinguish this from the earlier report.
>>>> 
>>>> -Paul
>>>> 
>>>> program terminated by signal SEGV (no mapping at the fault address)
>>>> 0xffffffff7d93b634: strlen+0x0014:      lduh     [%o2], %o1
>>>> Current function is guess_strlen
>>>>    71                       len += (int)strlen(sarg);
>>>> (dbx) where
>>>>   [1] strlen(0x2, 0x73000000, 0x2, 0x80808080, 0x2, 0x80808080), at 
>>>> 0xffffffff7d93b634 
>>>> =>[2] guess_strlen(fmt = 0xffffffff7eeada98 
>>>> "%uN:%uS:%uL3:%uL2:%uL1:%uC:%uH:%s", ap = 0xffffffff7ffff058), line 71 in 
>>>> "printf.c"
>>>>   [3] opal_vasprintf(ptr = 0xffffffff7ffff0b8, fmt = 0xffffffff7eeada98 
>>>> "%uN:%uS:%uL3:%uL2:%uL1:%uC:%uH:%s", ap = 0xffffffff7ffff050), line 218 in 
>>>> "printf.c"
>>>>   [4] opal_asprintf(ptr = 0xffffffff7ffff0b8, fmt = 0xffffffff7eeada98 
>>>> "%uN:%uS:%uL3:%uL2:%uL1:%uC:%uH:%s", ... = 0x807ede0103, ...), line 194 in 
>>>> "printf.c"
>>>>   [5] opal_hwloc_base_get_topo_signature(topo = 0x100128ea0), line 2134 in 
>>>> "hwloc_base_util.c"
>>>>   [6] rte_init(), line 205 in "ess_hnp_module.c"
>>>>   [7] orte_init(pargc = 0xffffffff7ffff61c, pargv = 0xffffffff7ffff610, 
>>>> flags = 4U), line 148 in "orte_init.c"
>>>>   [8] orterun(argc = 7, argv = 0xffffffff7ffff7a8), line 856 in "orterun.c"
>>>>   [9] main(argc = 7, argv = 0xffffffff7ffff7a8), line 13 in "main.c"
>>>> 
>>>> On Thu, Dec 11, 2014 at 7:17 PM, Ralph Castain <r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org>> wrote:
>>>> No, that looks different - it’s failing in mpirun itself. Can you get a 
>>>> line number on it?
>>>> 
>>>> Sorry for delay - I’m generating rc3 now
>>>> 
>>>> 
>>>>> On Dec 11, 2014, at 6:59 PM, Paul Hargrove <phhargr...@lbl.gov 
>>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>>> 
>>>>> Don't see an rc3 yet.
>>>>> 
>>>>> My Solaris-10/SPARC runs fail slightly differently (see below).
>>>>> It looks sufficiently similar that it MIGHT be the same root cause.
>>>>> However, lacking an rc3 to test I figured it would be better to report 
>>>>> this than to ignore it.
>>>>> 
>>>>> The problem is present with both V8+ and V9 ABIs, and with both Gnu and 
>>>>> Sun compilers.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> [niagara1:29881] *** Process received signal ***
>>>>> [niagara1:29881] Signal: Segmentation Fault (11)
>>>>> [niagara1:29881] Signal code: Address not mapped (1)
>>>>> [niagara1:29881] Failing at address: 2
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-pal.so.6.2.1:opal_backtrace_print+0x24
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-pal.so.6.2.1:0xaa160
>>>>> /lib/libc.so.1:0xc5364
>>>>> /lib/libc.so.1:0xb9e64
>>>>> /lib/libc.so.1:strlen+0x14 [ Signal 11 (SEGV)]
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-pal.so.6.2.1:opal_vasprintf+0x20
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-pal.so.6.2.1:opal_asprintf+0x30
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-pal.so.6.2.1:opal_hwloc_base_get_topo_signature+0x24c
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/openmpi/mca_ess_hnp.so:0x2d90
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/lib/libopen-rte.so.7.0.5:orte_init+0x2f8
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/bin/orterun:orterun+0xaa8
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/bin/orterun:main+0x14
>>>>> /sandbox/hargrove/OMPI/openmpi-1.8.4rc2-solaris10-sparcT2-gcc346-v8plus/INST/bin/orterun:_start+0x5c
>>>>> [niagara1:29881] *** End of error message ***
>>>>> Segmentation Fault - core dumped
>>>>> 
>>>>> On Thu, Dec 11, 2014 at 3:29 PM, Ralph Castain <r...@open-mpi.org 
>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>> Ah crud - incomplete commit means we didn’t send the topo string. Will 
>>>>> roll rc3 in a few minutes.
>>>>> 
>>>>> Thanks, Paul
>>>>> Ralph
>>>>> 
>>>>>> On Dec 11, 2014, at 3:08 PM, Paul Hargrove <phhargr...@lbl.gov 
>>>>>> <mailto:phhargr...@lbl.gov>> wrote:
>>>>>> 
>>>>>> Testing the 1.8.4rc2 tarball on my x86-64 Solaris-11 systems I am 
>>>>>> getting the following crash for both "-m32" and "-m64" builds:
>>>>>> 
>>>>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-j-19,pcp-j-20 
>>>>>> examples/ring_c'
>>>>>> [pcp-j-19:18762] *** Process received signal ***
>>>>>> [pcp-j-19:18762] Signal: Segmentation Fault (11)
>>>>>> [pcp-j-19:18762] Signal code: Address not mapped (1)
>>>>>> [pcp-j-19:18762] Failing at address: 0
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x64-ib-gcc452/INST/lib/libopen-pal.so.6.2.1'opal_backtrace_print+0x26
>>>>>>  [0xfffffd7ffaf237ba]
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x64-ib-gcc452/INST/lib/libopen-pal.so.6.2.1'show_stackframe+0x833
>>>>>>  [0xfffffd7ffaf20ba1]
>>>>>> /lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff202cc6]
>>>>>> /lib/amd64/libc.so.1'call_user_handler+0x2aa [0xfffffd7fff1f648e]
>>>>>> /lib/amd64/libc.so.1'strcmp+0x1a [0xfffffd7fff170fda] [Signal 11 (SEGV)]
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x64-ib-gcc452/INST/bin/orted'main+0x90
>>>>>>  [0x4010b7]
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x64-ib-gcc452/INST/bin/orted'_start+0x6c
>>>>>>  [0x400f2c]
>>>>>> [pcp-j-19:18762] *** End of error message ***
>>>>>> bash: line 1: 18762 Segmentation Fault      (core dumped) 
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x64-ib-gcc452/INST/bin/orted 
>>>>>> -mca ess "env" -mca orte_ess_jobid "911343616" -mca orte_ess_vpid 1 -mca 
>>>>>> orte_ess_num_procs "2" -mca orte_hnp_uri "911343616.0;tcp://172.16.0.120 
>>>>>> <http://172.16.0.120/>,172.18.0.120:50362 <http://172.18.0.120:50362/>" 
>>>>>> --tree-spawn -mca btl "sm,self,openib" -mca plm "rsh" -mca 
>>>>>> shmem_mmap_enable_nfs_warning "0"
>>>>>> 
>>>>>> Running gdb against a core generated by the 32-bit build gives line 
>>>>>> numbers:
>>>>>> #0  0xfea1cb45 in strcmp () from /lib/libc.so.1
>>>>>> #1  0xfeef4900 in orte_daemon (argc=26, argv=0x80479b0)
>>>>>>     at 
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x86-ib-gcc452/openmpi-1.8.4rc2/orte/orted/orted_main.c:789
>>>>>> #2  0x08050fb1 in main (argc=26, argv=0x80479b0)
>>>>>>     at 
>>>>>> /shared/OMPI/openmpi-1.8.4rc2-solaris11-x86-ib-gcc452/openmpi-1.8.4rc2/orte/tools/orted/orted.c:62
>>>>>> 
>>>>>> -Paul
>>>>>> 
>>>>>> -- 
>>>>>> Paul H. Hargrove                          phhargr...@lbl.gov 
>>>>>> <mailto:phhargr...@lbl.gov>
>>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>>> Computer Science Department               Tel: +1-510-495-2352 
>>>>>> <tel:%2B1-510-495-2352>
>>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>>>>>> <tel:%2B1-510-486-6900>_______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16514.php 
>>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16514.php>
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16515.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16515.php>
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Paul H. Hargrove                          phhargr...@lbl.gov 
>>>>> <mailto:phhargr...@lbl.gov>
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department               Tel: +1-510-495-2352 
>>>>> <tel:%2B1-510-495-2352>
>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>>>>> <tel:%2B1-510-486-6900>_______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16521.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16521.php>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16522.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16522.php>
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Paul H. Hargrove                          phhargr...@lbl.gov 
>>>> <mailto:phhargr...@lbl.gov>
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352 
>>>> <tel:%2B1-510-495-2352>
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>>>> <tel:%2B1-510-486-6900>_______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16524.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2014/12/16524.php>
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16541.php 
>> <http://www.open-mpi.org/community/lists/devel/2014/12/16541.php>
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov 
>> <mailto:phhargr...@lbl.gov>
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352 
>> <tel:%2B1-510-495-2352>
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>> <tel:%2B1-510-486-6900>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/12/16552.php 
>> <http://www.open-mpi.org/community/lists/devel/2014/12/16552.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16562.php 
> <http://www.open-mpi.org/community/lists/devel/2014/12/16562.php>
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352 
> <tel:%2B1-510-495-2352>
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
> <tel:%2B1-510-486-6900>
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16564.php 
> <http://www.open-mpi.org/community/lists/devel/2014/12/16564.php>

Reply via email to