[hpx-users] Improving documentation about configuring cluster for HPX and for running an HPX application

Shmuel Levine Sun, 28 Aug 2016 10:09:47 -0700

Hi All,

I'd like to ask if it might be possible for someone to provide moreclarity as to how to actually configure a cluster to use HPX, and how torun an HPX application (or at least the specific considerations forsetting up a cluster to run an application). As the above is general andmore than a little vague I'd like to clarify a bit:

- Although it is understood that a system will necessarily use some formof job management system (as described in the docs under GettingStarted, where examples are provided for PBS and SLURM), there are stillaspects of the configuration which are not obvious (at least to a fairlylay user such as myself). In particular, there are many configurationoptions described in the manual(http://stellar-group.github.io/hpx/docs/html/hpx/manual/init/commandline.htmlfor example):


--hpx:worker
--hpx:console
--hpx:connect
--hpx:run-agas-server
--hpx:run-hpx-main
--hpx:hpx arg
--hpx:agas arg
--hpx:run-agas-server-only

etc...

I admit that I haven't really tried to read through and understand thecode at all dealing with this part of the HPX runtime system. However,I think that the documentation should better clarify (a) what,specifically, do these options cause to have happen and considerationsabout when to use particular options; (b) which of these options aregenerally (or always) set by the job management system (by settingenvironment variables, for example), and which need to be set by theuser, in general.

Although I cannot find the message off-hand right now, I distinctlyremember some messages send the through the mailing list in whichsomeone provided a particular set of command line arguments to aquestioner to help diagnose some problems running their code. In thatcase, I seem to recall that one machine was set to only run the agas (orat least, a specific machine in the cluster was identified as hostingthe agas), and some other arguments, which I cannot recall right now.Should there always be a specific machine used for the AGAS? Are theinstances where it is required, instances where it is recommended,and/or instances where it is not at all necessary?

- Perhaps this is a bit more of a dumb question, but I'd ratherunderstand things well than not ask... Under the configuration /configuration default settings(http://stellar-group.github.io/hpx/docs/html/hpx/manual/init/configuration/config_defaults.html),there are obviously a number of options that are set and/or describedhere. For example:- under " /*The|[hpx|] Configuration Section", */there are variousoptions here which can be set -- such as hpx.location, hpx.localities,hpx.os_threads. It would seem reasonable that many of these are setautomatically by the software based on how the program was invokedthrough the job management system. Others such as stack size would notcome from the job management system (unless it is specifically passed asa command-line argument, where possible).- under [hpx.parcel] configuration section, again there are anumber of options such as the parcel address, port, etc., Which of theseare automatically set, which would need to be explicitly set?- As I understand from the description of the propertyhpx.parcel.mpi.enable this is automatically detected at startup, aslong as the application itself was started within a parallel MPIenvironment. I know this is somewhat outside the scope of thedocumentation itself, although having a more comprehensive set ofexamples using PBS and SLURM for different cases would be greatlyappreciated. Although that should primarily be addressed in the SLURMdocumentation, some additional examples of running programs would beextremely helpful to anyone who does not have the benefit of extensiveexperience.- looking at the [hpx.agas] section, it isn't clear to me how thisshould best be configured. This was partially noted above in the firsthalf of the question dealing with command-line arguments, but notcompletely. Obviously, the default address 127.0.0.1 would only workfor a program running only on the one single locality. There are anumber of things that are not obvious to me here: (a) is thepreprocessor constant applicable when HPX is compiled, or when anapplication using HPX is compiled? (b) Should there generally be asingle locality/cluster node which is set-aside to be the AGAS server?(c) how is the configuration option hpx.agas.service_mode supposed to beused? Perhaps, it seems to be the case that, in general, a single AGASserver should be selected for a cluster in advance, and that thesystem-wide ini config files should set this to be "bootstrap" on thatparticular machine and "hosted" on other machines. Is this the case orhave I misunderstood?

I recognize that the documentation is, of course, a work-in-progress,and although it is quite impressive and clear on a lot of points, thereare some other points, which I find to be rather unclear. Perhaps thisis related to the fact that most of the users of hpx have the benefit ofpeers/colleagues/sys admins which can provide this informationinformally. Unfortunately, right now, I do not personally have thatbenefit, and I'm sure that there are others in my situation presently,and there will undoubtedly become more and more of us as use of HPXbecomes increasingly widespread.

I would be willing, in general, to assist with improving documentationto the best of my abilities. If someone can help me to betterunderstand these types of issues (or other areas that have already beenidentified as requiring better documentation), I would be willing towrite up the documentation in a more articulate form, that can bepublished online.



Thanks and regards,

Shmuel

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

[hpx-users] Improving documentation about configuring cluster for HPX and for running an HPX application

Reply via email to