I don't even think that the load can be trusted in case of labs :P there are various external issues with things like NFS which are also affecting the load of a box (I've seen boxes with load like 5462492386276 in past on labs and they were just fine).
On Mon, May 26, 2014 at 4:29 PM, Tim Landscheidt <[email protected]> wrote: > Emilio J. Rodríguez-Posada <[email protected]> wrote: > >> These days I'm processing Wikipedia dumps. Today I tried English Wikipedia, >> which is in 150+ chunks (pages-meta-history*.7z). > >> I have a bash script that launches the jsub jobs, one job per chunk, so I >> queued more than +150 jobs. After that, I saw that 95 jobs of them were >> started and spread all over the execution nodes. > >> I saw the load of some of the nodes to reach 250%, is this normal? I >> stopped all them because I'm not sure if I have to launch small batches, 10 >> each time or so, or it is OK to launch all them and ignore the CPU load of >> execution nodes. > > The grid should keep the average load below 1, but that is > its job, not yours :-). So launching 150 jobs is totally > fine. If you see a load of more than 100 % for a prolonged > time, notifying an admin doesn't hurt, but due to the nature > of the system -- the grid can only guess what the /future/ > load of a job will be -- outliers are to be expected. > > Tim > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
