/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
-np 8 -mca btl sm,tcp --mca rtc_freq_priority 0
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempiProgram
terminated with signal 11, Segmentation fault.


#0  orte_plm_base_post_launch (fd=<value optimized out>, args=<value
optimized out>, cbdata=0x7393b0) at base/plm_base_launch_support.c:607
607             opal_event_evtimer_del(timer->ev);
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libgcc-4.4.7-3.el6.x86_64
libpciaccess-0.13.1-2.el6.x86_64 numactl-2.0.7-6.el6.x86_64
(gdb) bt
#0  orte_plm_base_post_launch (fd=<value optimized out>, args=<value
optimized out>, cbdata=0x7393b0) at base/plm_base_launch_support.c:607
#1  0x00007ffff7b1076c in event_process_active_single_queue (base=0x630d30,
flags=<value optimized out>) at event.c:1367
#2  event_process_active (base=0x630d30, flags=<value optimized out>) at
event.c:1437
#3  opal_libevent2021_event_base_loop (base=0x630d30, flags=<value
optimized out>) at event.c:1645
#4  0x000000000040501d in orterun (argc=10, argv=0x7fffffffe208) at
orterun.c:1080
#5  0x00000000004039e4 in main (argc=10, argv=0x7fffffffe208) at main.c:13


On Mon, Jun 2, 2014 at 3:31 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> OK,
>
> please send me a clean gdb backtrace :
> ulimit -c unlimited
> /* this should generate a core */
> mpirun ...
> gdb mpirun core...
> bt
>
> if no core
> gdb mpirun
> r -np ... --mca ... ...
> and after the crash
> bt
>
> then i can only review the code and hope i can find the root cause of the
> error i am unable to reproduce in my environment
>
> Cheers,
>
> Gilles
>
>
>
>
> On Mon, Jun 2, 2014 at 9:03 PM, Mike Dubman <mi...@dev.mellanox.co.il>
> wrote:
>
>> Hi,
>> The jenkins took your commit and applied automatically, I tried with mca
>> flag later.
>> Also, we don`t have /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>> in our system, the cpuspeed daemon is off by default on all our nodes.
>>
>>
>> Regards
>> M
>>
>>
>> On Mon, Jun 2, 2014 at 3:00 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Mike,
>>>
>>> did you apply the patch *and* mpirun --mca rtc_freq_priority 0 ?
>>>
>>> *both* are required (--mca rtc_freq_priority 0 is not enough without the
>>> patch)
>>>
>>> can you please confirm there is no 
>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>> (pseudo) file on your system ?
>>>
>>> if this still does not work for you, then this might be a different
>>> issue i was unable to reproduce.
>>> in this case, could you run mpirun under gdb and send a gdb stack trace ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>>
>>>
>>> On Mon, Jun 2, 2014 at 8:26 PM, Mike Dubman <mi...@dev.mellanox.co.il>
>>> wrote:
>>>
>>>> more info, specifying --mca rtc_freq_priority 0 explicitly, generates
>>>> different kind of fail:
>>>>
>>>> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>>> -np 8 -mca btl sm,tcp --mca rtc_freq_priority 0
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
>>>> [vegas12:13887] *** Process received signal ***
>>>> [vegas12:13887] Signal: Segmentation fault (11)
>>>> [vegas12:13887] Signal code: Address not mapped (1)
>>>> [vegas12:13887] Failing at address: 0x20
>>>> [vegas12:13887] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
>>>> [vegas12:13887] [ 1]
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_plm_base_post_launch+0x90)[0x7ffff7dcbe50]
>>>> [vegas12:13887] [ 2]
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-pal.so.0(opal_libevent2021_event_base_loop+0x8bc)[0x7ffff7b1076c]
>>>> [vegas12:13887] [ 3]
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(orterun+0x126d)[0x40501d]
>>>> [vegas12:13887] [ 4]
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(main+0x20)[0x4039e4]
>>>> [vegas12:13887] [ 5]
>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
>>>> [vegas12:13887] [ 6]
>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun[0x403909]
>>>> [vegas12:13887] *** End of error message ***
>>>> Segmentation fault (core dumped)
>>>>
>>>>
>>>> On Mon, Jun 2, 2014 at 2:24 PM, Mike Dubman <mi...@dev.mellanox.co.il>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> This fix "orte_rtc_base_select: skip a RTC module if it has a zero
>>>>> priority" did not help and jenkins stilll fails as before.
>>>>> The ompi was configured:
>>>>> --with-platform=contrib/platform/mellanox/optimized
>>>>> --with-ompi-param-check --enable-picky --with-knem --with-mxm --with-fca
>>>>>
>>>>> The run was on single node:
>>>>>
>>>>> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>>>>  -np 8 -mca btl sm,tcp 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
>>>>> [vegas12:13834] *** Process received signal ***
>>>>> [vegas12:13834] Signal: Segmentation fault (11)
>>>>> [vegas12:13834] Signal code: Address not mapped (1)
>>>>> [vegas12:13834] Failing at address: (nil)
>>>>> [vegas12:13834] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
>>>>> [vegas12:13834] [ 1] /lib64/libc.so.6(fgets+0x2d)[0x3937466f2d]
>>>>> [vegas12:13834] [ 2] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_rtc_freq.so(+0x1f3f)[0x7ffff41f5f3f]
>>>>> [vegas12:13834] [ 3] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_rtc_freq.so(+0x279b)[0x7ffff41f679b]
>>>>> [vegas12:13834] [ 4] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_rtc_base_select+0xe6)[0x7ffff7ddc036]
>>>>> [vegas12:13834] [ 5] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_ess_hnp.so(+0x4056)[0x7ffff725b056]
>>>>> [vegas12:13834] [ 6] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_init+0x174)[0x7ffff7d97254]
>>>>> [vegas12:13834] [ 7] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(orterun+0x863)[0x404613]
>>>>> [vegas12:13834] [ 8] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(main+0x20)[0x4039e4]
>>>>> [vegas12:13834] [ 9] 
>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
>>>>> [vegas12:13834] [10] 
>>>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun[0x403909]
>>>>> [vegas12:13834] *** End of error message ***
>>>>> Segmentation fault (core dumped)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 2, 2014 at 10:19 AM, Gilles Gouaillardet <
>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>
>>>>>> Mike and Ralph,
>>>>>>
>>>>>> i could not find a simple workaround.
>>>>>>
>>>>>> for the time being, i commited r31926 and invite those who face a
>>>>>> similar issue to use the following workaround :
>>>>>> export OMPI_MCA_rtc_freq_priority=0
>>>>>> /* or mpirun --mca rtc_freq_priority 0 ... */
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 2, 2014 at 3:45 PM, Gilles Gouaillardet <
>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>
>>>>>>> in orte/mca/rtc/freq/rtc_freq.c at line 187
>>>>>>> fp = fopen(filename, "r");
>>>>>>> and filename is
>>>>>>> "/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
>>>>>>>
>>>>>>> there is no error check, so if fp is NULL, orte_getline() will call
>>>>>>> fgets() that will crash.
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/06/14939.php
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/06/14945.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/06/14947.php
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/06/14948.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/14949.php
>

Reply via email to