Hey, On Fri, Oct 31, 2014 at 2:45 PM, Christian Lohmaier <lohma...@googlemail.com> wrote: > Hi Markus, *, > > On Fri, Oct 31, 2014 at 2:38 PM, Markus Mohrhard > <markus.mohrh...@googlemail.com> wrote: >> >> The quick and ugly one is to partition the directories into 100 file >> directories. I have a script for that as I have done exactly that for >> the memcheck run on the 70 core Largo server. It is a quick and ugly >> implementation. >> The clean and much better solution is to move away from directory >> based invocation and partion by files on the fly. > > Yeah, I also thought of keeping the per-directory/filetype processing, > but instead run multiple dirs at once, rather divide the set of files > of a given dir into the <number of workers> chunks. > >> I have a >> proof-of-concept somewhere on my machine and will push a working >> version during the next days. > > nice :-) >
So a working version is currently running on the VM. The version in the repo will be updated as soon as the script finishes without a problem. It parallelizes now nearly perfectly as it divides the work in 100 file chunks and works on them. This means that after the last update of the test files we have 641 jobs that will be put into a queue and we process as many jobs in parallel as we want (5 at the VM at the moment). Additionally the updated version of the script no longer hard codes a mapping from the file extension to the component and instead queries LibreOffice to see which component opened the file. That allows to remove quite a few mappings and will result in all file types to be imported. The old version only imported file types that were registered. The new script should scale nearly perfectly. There are still a few enhancements on my list so if anyone is interested in python tasks please talk to me. Regards, Markus _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice