Yea. I meant execute. Not submitted. Sorry for the confusion. On May 26, 2014, at 2:45 PM, Tim Landscheidt <[email protected]> wrote:
> Maximilian Doerr <[email protected]> wrote in a slightly > different order: > >>>> These days I'm processing Wikipedia dumps. Today I tried English Wikipedia, >>>> which is in 150+ chunks (pages-meta-history*.7z). > >>>> I have a bash script that launches the jsub jobs, one job per chunk, so I >>>> queued more than +150 jobs. After that, I saw that 95 jobs of them were >>>> started and spread all over the execution nodes. > >>>> I saw the load of some of the nodes to reach 250%, is this normal? I >>>> stopped all them because I'm not sure if I have to launch small batches, 10 >>>> each time or so, or it is OK to launch all them and ignore the CPU load of >>>> execution nodes. > >>> The grid should keep the average load below 1, but that is >>> its job, not yours :-). So launching 150 jobs is totally >>> fine. If you see a load of more than 100 % for a prolonged >>> time, notifying an admin doesn't hurt, but due to the nature >>> of the system -- the grid can only guess what the /future/ >>> load of a job will be -- outliers are to be expected. > >> Wait. The grid should have a limit of 15. I've hit that limit so many >> times, I received my own exec node. > > No, the grid should have no limit for the number of jobs > submitted, but limit the number of jobs executed in parallel > per user. Apparently, the latter got lost during the migra- > tion from pmtpa to eqiad. I've filed > https://bugzilla.wikimedia.org/65777 for that. > > Tim > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
