Hi Ralph,
I'm sorry to bother you again but adding the new component to rds still doesn't work as expected. I've created a new component rds_mosix. it is identical to rds_hostfile (with all parameters names changed) except: in rds/mosix/rds_mosix_component.c/orte_rds_mosix_open: mca_base_param_reg_string("rds_hostfile", "path", "ORTE Host filename", false, false, path, &mca_rds_mosix_component.path); in rds/mosix/rds_mosix.c/orte_rds_mosix_query: rc = mca_base_param_find("rds", "hostfile", "path"); mca_base_param_lookup_string(rc, &mca_rds_mosix_component.path); printf("got hostfile: %s\n", mca_rds_mosix_component.path); So I'm running: mpirun --mca rmaps round_robin --mca rds mosix --hostfile $MOSHOME/4hosts -np 2 hostname and getting the output: "got hostfile: <default_hostfile_path>" and not the given path. What am I doing wrong? Thank you --David ---------- Forwarded message ---------- From: Ralph Castain <r...@lanl.gov> List-Post: devel@lists.open-mpi.org Date: Oct 20, 2007 6:52 PM Subject: Re: [OMPI devel] Trying to get total procs num in odls framework To: David Erukhimovich <davider...@cs.huji.ac.il> On 10/20/07 10:10 AM, "David Erukhimovich" <davider...@cs.huji.ac.il> wrote: > > > Hi Ralph, > > 2. I do want the user to be able to switch between my way of process > launching, and the default way. I can do it using an mca flag, but I would > prefer a new component. If I is not too defficult for you, please make the > patch, if it is, I'll just use an mca flag. I can make it next week - shouldn't be too big a deal. I'll let you know if otherwise. > > 1. Just remmembered another difficulty I had: I've created a new rds > component identical to the hostfile one. lets call it mosix. Now, orterun > is saving the hostfile path in the mca parameter - rds_hostfile_path or > something like that. when I try to retrieve rds_hostfile_path or > rds_mosix_path in rds_mosix component I always get the default hostfile path > (doesn't matter if I gave an hostfile or not). And I tried everything - > changing names in rds_mosix_component, declaring a new parameter > rds_mosix_path in various places etc. So now I'm just altering the existing > hostfile component. > Do you have any suggestions how to make it work? How are you retrieving the path? Here is the code from hostfile: mca_base_param_reg_string(&mca_rds_hostfile_component.super.rds_version, "path", "ORTE Host filename", false, false, path, &mca_rds_hostfile_component.path); If you look at that, it is actually looking for an mca param of "rds_hostfile_path". If you just copied this code, though, using your component's name, then you would be looking for the mca param "rds_<your-components-name>_path". What you probably need to do is hardwire it to: mca_base_param_reg_string("rds_hostfile", "path", "ORTE Host filename", false, false, path, &default_path); Also, you may be encountering a problem in that the rds_hostfile component is going to try and run as well as your component, and thus may overwrite what you do. You might want to try -mca rds my_component to ensure that only your component gets executed. > > Sorry for all the questions and thank you very much for the quick answers > Not a problem - hope this helps. Ralph > Regards > --David > > ---------- Forwarded message ---------- > From: Ralph Castain <r...@lanl.gov> > Date: Oct 20, 2007 5:12 PM > Subject: Re: [OMPI devel] Trying to get total procs num in odls framework > To: David Erukhimovich <davider...@cs.huji.ac.il> > > Hi David > > Thanks for the info - see comments below. > > Ralph > > > On 10/20/07 6:58 AM, "David Erukhimovich" <davider...@cs.huji.ac.il> wrote: > >> Hi >> Thank you for your answer. >> >> First of all, my two questions wasn't connected and they belong to > different >> part of my project. and the subject of the mail should have been: Trying > to >> get total procs num in rds framework (sorry my mistake). >> >> Here the parts in the order of the last email >> >> 1. I've solved the problem about getting total num of procs in rds (just >> called some function incorrectly), so sorry for disturbing you about > that. >> Now a bit more about what I'm trying to do, maybe there is a better way > then >> mine: >> I have a tool (external application) that given a list of machines and a >> number n , it chooses the n best ones from the list (least loaded ones) > and >> if the list of machines isn't given, it just returns the n best machines >> from the claster. I am wishing to include this in ompi. hence - given a >> machinefile, It'll run the process only on the best nodes. If a > machinefile >> isn't given, it'll take the best node that my application returns. >> I think the best place to implement it is in rds - after building the list >> of newly discovered nodes: if it is empty, fill it using my tool, > otherwise >> filter it using my tool. It seems to me the most logical way to do it. Am > I >> right? I am asking you because I guess you have a better knowledge in ompi >> architecture. > > It sounds like the correct place to me. At some point in the future, you > could migrate that logic to the RAS instead, but I would just continue as > you are doing for now. > >> >> 2. The other thing I am trying to do is to make ompi to run every process, >> not directly, but through external program. e.g: If I want to launch the >> program "hostname", I want that following to be launched: "<my-program> >> <my-program's-flags> hostname". >> I figured that the best way to do it is in odls framework because there I >> have the exact executing point. > > I guess I wouldn't do it that way if I were doing a project of my own. I > would just go into the default odls module and hardcode the revised launch. > I can't see this coming back into the production system, so unless you have > some reason to want to run both with and without your revision, why go > through the pain? > >> I am currently working on the checkpoint 1.2.3. I don't work on the trunk >> because I need the patches to be added on some stable release. Is there a >> 1.2.* release where the bug is fixed. And if not - when can such fixed >> version be stable > > I don't think there are any plans to backport that fix, though I imagine it > could be done. If not, I could try and create a patch for you next week, > though I would again suggest you just hardcode your change into the existing > odls default component to make your life easier. > > Ralph > >> >> Thank you >> --Davis >> >> ---------- Forwarded message ---------- >> From: Ralph Castain <r...@lanl.gov> >> Date: Oct 17, 2007 11:22 PM >> Subject: Re: [OMPI devel] Trying to get total procs num in odls framework >> To: davider...@cs.huji.ac.il >> Cc: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> >> >> Hi David >> >> I could probably answer your questions better if I had a better >> understanding of what you are trying to do. For example, looking in the >> hostfile rds for the number of procs to be launched seems strange as the >> functional role of the framework is to simply learn what nodes are >> available. >> >> It would also help to have some idea of what environment you are working > in, >> and how you configured the beast. >> >> Please see comments below. >> Ralph >> >> >> On 10/17/07 2:47 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >> >>> Yo Ralph -- >>> >>> Can you answer these questions? >>> >>> Begin forwarded message: >>> >>>> From: David Erukhimovich <davider...@cs.huji.ac.il> >>>> Date: October 14, 2007 5:08:45 PM EDT >>>> To: de...@open-mpi.org >>>> Subject: [OMPI devel] Trying to get total procs num in odls framework >>>> Reply-To: Open MPI Developers <de...@open-mpi.org> >>>> >>>> Hello, >>>> I have 2 questions: >>>> 1. I am trying to get the total number of requested processes for >>>> the job >>>> in' hostfile' component in rds. I took the job object that was >>>> given as a >>>> parameter, extracted the application objects and checked how many >>>> procs >>>> each application has. The result in every run was 0. As I >>>> understand, this >>>> variable is updated before the rds part. So what am I doing wrong? >> >> Do you mean you took the jobid given to the hostfile RDS (which isn't an >> object, but just a number) and did an orte_rmgr.get_app_context to get the >> array of app_contexts? Is there some reason why you would want to do that >> there? >> >> Depending upon what the command line looks like, it is possible for the >> number of procs to be zero - we allow that option and then fill in the >> number later. If it was specified, though, we do insert the number in the >> app_context object. >> >> Maybe you could tell me what the command line looks like, the function > call >> you used to get the "application objects", and what field you were looking >> at when you found zero? >> >>>> >>>> 2. I've discovered an undocumented framework - odls. >> >> It wasn't exactly hidden...we haven't documented it because we are lazy > and >> the existing components cover every known environment (or so we thought). >> ;-) >> >> Is there some special reason to want to create another one? >> >>>> I've created a >>>> new >>>> component for it. The problem is that there is no way to switch >>>> between >>>> the default component and mine (--mca odls <my component> doesn't >>>> work). >>>> Is there a way to switch between odls components (I saw bprocs >>>> there and >>>> I guess it is used)? >> >> Are you working on the trunk? What r level? >> >> Reason I ask: I recently fixed a problem where the command line mca params >> were not getting passed to the orteds. Your description looks like you >> haven't picked up that change. If you have updated recently, and you still >> can't get it to work, then we likely have a lingering problem. >> >> >> If I read your subject line correctly, then I am somewhat puzzled. You can >> look at the orte/mca/odls/base/odls_base_default_fns.c file, the >> orte_odls_base_default_get_add_procs_data function and see where we get > the >> total number of procs in a job and how that is passed to the orteds. If > you >> have some new environment that the existing odls components can't handle, >> then I would strongly suggest you at least use the default functions in > the >> base to provide as much support as possible as this will help you to keep >> pace with changes in the system. >> >> I would also welcome feedback on what you encountered that required a new >> odls component - perhaps we can modify the base support functions to make > it >> fit within one of the existing components. >> >> Thanks >> Ralph >> >> >>>> >>>> Thank you, >>>> --David >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>