Hi All,

I'd like to ask if it might be possible for someone to provide more clarity as to how to actually configure a cluster to use HPX, and how to run an HPX application (or at least the specific considerations for setting up a cluster to run an application). As the above is general and more than a little vague I'd like to clarify a bit:

- Although it is understood that a system will necessarily use some form of job management system (as described in the docs under Getting Started, where examples are provided for PBS and SLURM), there are still aspects of the configuration which are not obvious (at least to a fairly lay user such as myself). In particular, there are many configuration options described in the manual (http://stellar-group.github.io/hpx/docs/html/hpx/manual/init/commandline.html for example):

--hpx:worker
--hpx:console
--hpx:connect
--hpx:run-agas-server
--hpx:run-hpx-main
--hpx:hpx arg
--hpx:agas arg
--hpx:run-agas-server-only

etc...

I admit that I haven't really tried to read through and understand the code at all dealing with this part of the HPX runtime system. However, I think that the documentation should better clarify (a) what, specifically, do these options cause to have happen and considerations about when to use particular options; (b) which of these options are generally (or always) set by the job management system (by setting environment variables, for example), and which need to be set by the user, in general.

Although I cannot find the message off-hand right now, I distinctly remember some messages send the through the mailing list in which someone provided a particular set of command line arguments to a questioner to help diagnose some problems running their code. In that case, I seem to recall that one machine was set to only run the agas (or at least, a specific machine in the cluster was identified as hosting the agas), and some other arguments, which I cannot recall right now. Should there always be a specific machine used for the AGAS? Are the instances where it is required, instances where it is recommended, and/or instances where it is not at all necessary?

- Perhaps this is a bit more of a dumb question, but I'd rather understand things well than not ask... Under the configuration / configuration default settings (http://stellar-group.github.io/hpx/docs/html/hpx/manual/init/configuration/config_defaults.html), there are obviously a number of options that are set and/or described here. For example: - under " /*The|[hpx|] Configuration Section", */there are various options here which can be set -- such as hpx.location, hpx.localities, hpx.os_threads. It would seem reasonable that many of these are set automatically by the software based on how the program was invoked through the job management system. Others such as stack size would not come from the job management system (unless it is specifically passed as a command-line argument, where possible). - under [hpx.parcel] configuration section, again there are a number of options such as the parcel address, port, etc., Which of these are automatically set, which would need to be explicitly set? - As I understand from the description of the property hpx.parcel.mpi.enable this is automatically detected at startup, as long as the application itself was started within a parallel MPI environment. I know this is somewhat outside the scope of the documentation itself, although having a more comprehensive set of examples using PBS and SLURM for different cases would be greatly appreciated. Although that should primarily be addressed in the SLURM documentation, some additional examples of running programs would be extremely helpful to anyone who does not have the benefit of extensive experience. - looking at the [hpx.agas] section, it isn't clear to me how this should best be configured. This was partially noted above in the first half of the question dealing with command-line arguments, but not completely. Obviously, the default address 127.0.0.1 would only work for a program running only on the one single locality. There are a number of things that are not obvious to me here: (a) is the preprocessor constant applicable when HPX is compiled, or when an application using HPX is compiled? (b) Should there generally be a single locality/cluster node which is set-aside to be the AGAS server? (c) how is the configuration option hpx.agas.service_mode supposed to be used? Perhaps, it seems to be the case that, in general, a single AGAS server should be selected for a cluster in advance, and that the system-wide ini config files should set this to be "bootstrap" on that particular machine and "hosted" on other machines. Is this the case or have I misunderstood?


I recognize that the documentation is, of course, a work-in-progress, and although it is quite impressive and clear on a lot of points, there are some other points, which I find to be rather unclear. Perhaps this is related to the fact that most of the users of hpx have the benefit of peers/colleagues/sys admins which can provide this information informally. Unfortunately, right now, I do not personally have that benefit, and I'm sure that there are others in my situation presently, and there will undoubtedly become more and more of us as use of HPX becomes increasingly widespread.


I would be willing, in general, to assist with improving documentation to the best of my abilities. If someone can help me to better understand these types of issues (or other areas that have already been identified as requiring better documentation), I would be willing to write up the documentation in a more articulate form, that can be published online.


Thanks and regards,

Shmuel


_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to