Hey, On Fri, Oct 31, 2014 at 2:23 PM, Christian Lohmaier <[email protected]> wrote: > Hi *, > > On Thu, Oct 30, 2014 at 5:39 PM, Michael Meeks > <[email protected]> wrote: >> >> * Crashtest futures / automated test scripts (Markus) >> + call on Tuesday; new testing hardware. >> + result - get a Manitu server & leave room in the budget for >> ondemand Amazon instances (with spot pricing) if there is >> special need at some point. >> [...] > > When I played with the crashtest setup I noticed some limitations in > the current layout of the crashtest-setup that prevents just using > lots of cores/high parallelism to get faster results. > > The problem is that it is parallelized per directory, but the amount > of files in a directory is not evenly distributed at all. So when the > script decides to start odt tests last, the whole set of odt files > will only be tested in one thread, leaving the other CPU-cores idling > around with nothing to do. > > I did add a sorting statement to the script, so it will start with the > directories with most files[1], but even with that you run into the > problem that towards the end of the testrun not all cores will be > used. As the AMD Opterons in the Manitu ones are less capable per-cpu > this will set a limit to how much you can accelerate the run by just > assigning more cores to it. > > Didn't look into the overall setup to know whether just segmenting the > large directories into smaller ones is easy to do or not (i.e instead > of having one odt dir with 10500+ files, have 20 with ~ 500 each. > > ciao > Christian > > [1] added the sorted statement that uses the number of files in the > directory as the key to sort by: > > def get_numfiles(directory): > return len([f for f in os.listdir(directory)]) > > def get_directories(): > d='.' > directories = [o for o in os.listdir(d) if > os.path.isdir(os.path.join(d,o))] > return sorted(directories, key=get_numfiles, reverse=True)
This is currently a known limitation but there are two solutions to the problem: The quick and ugly one is to partition the directories into 100 file directories. I have a script for that as I have done exactly that for the memcheck run on the 70 core Largo server. It is a quick and ugly implementation. The clean and much better solution is to move away from directory based invocation and partion by files on the fly. I have a proof-of-concept somewhere on my machine and will push a working version during the next days. This would even give us about half a day on our current setup as ods and odt are normally the last two running for about half a day longer than the rest of the script. With both solutions this scales perfectly. We have already tested it on the Largo server where I was able to keep a load of 70 for exactly a week (with memcheck but that does only affect the overall runtime). Regards, Markus _______________________________________________ LibreOffice mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice
