To whom it may concern,

The bug has had several other people talk about how adding the appropriate
lines will solve the issue Alexey is having. I understand the need to have
an ³update friendly² experience from EPEL as it supports software on RHEL.
However, Alexey is ignoring a large part of the torque community who wants
to have numa support enabled. I would consider not having numa support
enabled to be a "Serious bugs that cannot be fixed in the existing
version² of torque as I doubt you can buy a laptop without some numa in
the CPU. This support is critical for HPC applications to take advantage
of and they need to be able to lock adjacent processes to adjacent cpus in
modern numa systems.

This does however bring to mind something that¹s been growing in my head
for a while now. I think we need a Fedora HPC SIG to help communicate the
needs of the HPC community to Fedora (and then hopefully Redhat) about how
HPC clusters work and what kinds of support we need when it comes to the
software we run. The EPEL Guidelines Digest
(https://fedoraproject.org/wiki/EPEL/GuidelinesAndPolicies#Digest) isn¹t
wrong the scope just different for HPC systems and specifically the HPC
software we want. The scope for an HPC system is focused on the life cycle
of the cluster we purchase, not the life cycle of the RHEL version of OS.
Furthermore, the things we need updated when we bring a new cluster online
are the newest compiler, resource management, MPI and cluster management
software. As, the software we run on the HPC systems often requires the
newest of those things to take advantage of the newest features in the
hardware we just purchased. Others, like Alexey, have larger community
supported software stacks they have to integrate on site with whatever
hardware they are tasked with using, the changes for that life cycle are
based on what the larger community needs. Alexey will do updates when the
community documentation has changed and he¹s required to update to
continue to be a part of the community.

My arguments for having an HPC SIG is to bridge the communication gaps
between HPC workflows (mentioned above) and external software like the
OpenHPC project (http://www.openhpc.community/) and EasyBuild
(https://github.com/hpcugent/easybuild) since these project focus on
building software on RHEL systems but don¹t follow any of the guidelines
we have in place, and thus produce substandard RPMs that most HPC
administrators have to deal with. There are some very important features
in these projects that the Fedora package manager should try and support.
Since, Alexey isn¹t wrong in his assessment of the situation, he¹s just in
a different part of his cluster¹s life cycle. If he bought a cluster
tomorrow to replace the current one he has he may feel different about the
numa support (I don¹t know). However, other users have bought clusters and
are more than likely taking advantage of the numa support to gain
performance on their systems.

Just my thoughtsŠ

Thanks,
- David Brown



On 4/13/16, 11:00 AM, "[email protected]" <[email protected]>
wrote:

>Hi,
>
>We use in production cluster troque from epel6.
>Our cluster is part of large set of clusters that use packages EPEL
>repository
>https://twiki.cern.ch/twiki/bin/view/EMI/GenericInstallationConfigurationE
>MI3
>
>Everything worked fine until torque updated to torque-4.2.10-9.el6 with
>enabled NUMA 
>https://bugzilla.redhat.com/show_bug.cgi?id=1231148
>
>This update don't starts at all. I filed bug against torque
>https://bugzilla.redhat.com/show_bug.cgi?id=1321154
>
>There was suggested to remove NUMA support and build different set of
>packages with enabled NUMA.
>
>But maintainer solution was to add file that required for starting
>pbs_mom service.
>Just installing of torque-4.2.10-10.el6 update still don't works (service
>starts but nodes down).
>This solution needs additional reconfiguration of pbs_server but we can't
>do such experiments
>on our cluster because there is no guarantee that after reconfiguration
>everything will work as expected.
>
>EPEL update policy recommends to avoid updates that causes such problems
>https://fedoraproject.org/wiki/EPEL_Updates_Policy#Stable_Releases
>
>-- 
>Alexey Kurov <[email protected]>
_______________________________________________
epel-devel mailing list
[email protected]
http://lists.fedoraproject.org/admin/lists/[email protected]

Reply via email to