Mike,

did you apply the patch *and* mpirun --mca rtc_freq_priority 0 ?

*both* are required (--mca rtc_freq_priority 0 is not enough without the
patch)

can you please confirm there is no
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
(pseudo) file on your system ?

if this still does not work for you, then this might be a different issue i
was unable to reproduce.
in this case, could you run mpirun under gdb and send a gdb stack trace ?


Cheers,

Gilles




On Mon, Jun 2, 2014 at 8:26 PM, Mike Dubman <mi...@dev.mellanox.co.il>
wrote:

> more info, specifying --mca rtc_freq_priority 0 explicitly, generates
> different kind of fail:
>
> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
> -np 8 -mca btl sm,tcp --mca rtc_freq_priority 0
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
> [vegas12:13887] *** Process received signal ***
> [vegas12:13887] Signal: Segmentation fault (11)
> [vegas12:13887] Signal code: Address not mapped (1)
> [vegas12:13887] Failing at address: 0x20
> [vegas12:13887] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
> [vegas12:13887] [ 1]
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_plm_base_post_launch+0x90)[0x7ffff7dcbe50]
> [vegas12:13887] [ 2]
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-pal.so.0(opal_libevent2021_event_base_loop+0x8bc)[0x7ffff7b1076c]
> [vegas12:13887] [ 3]
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(orterun+0x126d)[0x40501d]
> [vegas12:13887] [ 4]
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(main+0x20)[0x4039e4]
> [vegas12:13887] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
> [vegas12:13887] [ 6]
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun[0x403909]
> [vegas12:13887] *** End of error message ***
> Segmentation fault (core dumped)
>
>
> On Mon, Jun 2, 2014 at 2:24 PM, Mike Dubman <mi...@dev.mellanox.co.il>
> wrote:
>
>> Hi,
>> This fix "orte_rtc_base_select: skip a RTC module if it has a zero
>> priority" did not help and jenkins stilll fails as before.
>> The ompi was configured:
>> --with-platform=contrib/platform/mellanox/optimized
>> --with-ompi-param-check --enable-picky --with-knem --with-mxm --with-fca
>>
>> The run was on single node:
>>
>> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>  -np 8 -mca btl sm,tcp 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
>> [vegas12:13834] *** Process received signal ***
>> [vegas12:13834] Signal: Segmentation fault (11)
>> [vegas12:13834] Signal code: Address not mapped (1)
>> [vegas12:13834] Failing at address: (nil)
>> [vegas12:13834] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
>> [vegas12:13834] [ 1] /lib64/libc.so.6(fgets+0x2d)[0x3937466f2d]
>> [vegas12:13834] [ 2] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_rtc_freq.so(+0x1f3f)[0x7ffff41f5f3f]
>> [vegas12:13834] [ 3] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_rtc_freq.so(+0x279b)[0x7ffff41f679b]
>> [vegas12:13834] [ 4] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_rtc_base_select+0xe6)[0x7ffff7ddc036]
>> [vegas12:13834] [ 5] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_ess_hnp.so(+0x4056)[0x7ffff725b056]
>> [vegas12:13834] [ 6] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libopen-rte.so.0(orte_init+0x174)[0x7ffff7d97254]
>> [vegas12:13834] [ 7] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(orterun+0x863)[0x404613]
>> [vegas12:13834] [ 8] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun(main+0x20)[0x4039e4]
>> [vegas12:13834] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
>> [vegas12:13834] [10] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun[0x403909]
>> [vegas12:13834] *** End of error message ***
>> Segmentation fault (core dumped)
>>
>>
>>
>>
>> On Mon, Jun 2, 2014 at 10:19 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Mike and Ralph,
>>>
>>> i could not find a simple workaround.
>>>
>>> for the time being, i commited r31926 and invite those who face a
>>> similar issue to use the following workaround :
>>> export OMPI_MCA_rtc_freq_priority=0
>>> /* or mpirun --mca rtc_freq_priority 0 ... */
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>>
>>>
>>> On Mon, Jun 2, 2014 at 3:45 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> in orte/mca/rtc/freq/rtc_freq.c at line 187
>>>> fp = fopen(filename, "r");
>>>> and filename is "/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
>>>>
>>>> there is no error check, so if fp is NULL, orte_getline() will call
>>>> fgets() that will crash.
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/06/14939.php
>>>
>>
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/14945.php
>

Reply via email to