Maybe... 3rd time's the charm...?  (This time from Opera).

On Wed, Jan 4, 2012 at 5:11 PM, Dawid Weiss
<[email protected]> wrote:

> I forgot how nasty your beast computer is... 20 slaves?! Remind me how
> many actual (real) cores do you have?

Beast has two 6-core CPUs (x5680 xeons), so 12 real cores (24 with
hyperthreading).

> Did you experiment with
> different slave numbers? I ask because I noticed that:
>
> 1) it makes little sense to run cpu-intense tests on hyper-cores,
> doesn't yield much if anything,
> 2) you should leave some room for system vm threads (GC, compilers);
> the more VMs, the more room you'll need.

In the past I found somewhere around 20 was good w/ the Python
runner... but I went and tried again!

With the Python runner I see these run times on just lucene core tests:

   2 cpus: 72.2 sec
   5 cpus: 35.0 sec
  10 cpus: 28.1 sec
  15 cpus: 26.2 sec
  20 cpus: 26.0 sec
  25 cpus: 27.5 sec

So seems like after 15 cores it's not helping much... but then I ran
on all tests (well minus a few intermittently failing tests):

  10 cpus: 88.3 sec
  15 cpus: 80.2 sec
  20 cpus: 77.4 sec
  25 cpus: 76.7 sec

The above were just running on beast, but the Python runner can send
jobs (hacked up, just using ssh) to other machines... I have two other
non-beasts, and which I ran 3 jvms on each:

  25 + 3 + 3 cpus: 64.7 sec

With the new ant runner:

2 cpus:

   [junit4] Slave 0:     0.16 ..    50.68 =    50.52s
   [junit4] Slave 1:     0.16 ..    49.58 =    49.42s
   [junit4] Execution time total: 50.73s
   [junit4] Tests summary: 279 suites, 1546 tests, 4 ignored


5 cpus:

   [junit4] Slave 0:     0.19 ..    21.87 =    21.68s
   [junit4] Slave 1:     0.16 ..    21.86 =    21.70s
   [junit4] Slave 2:     0.16 ..    29.31 =    29.15s
   [junit4] Slave 3:     0.16 ..    26.64 =    26.48s
   [junit4] Slave 4:     0.19 ..    29.82 =    29.63s
   [junit4] Execution time total: 29.89s
   [junit4] Tests summary: 279 suites, 1546 tests, 4 ignored

10 cpus:

   [junit4] Slave 0:     0.21 ..    14.62 =    14.41s
   [junit4] Slave 1:     0.22 ..    17.21 =    16.99s
   [junit4] Slave 2:     0.23 ..    18.79 =    18.56s
   [junit4] Slave 3:     0.23 ..    22.99 =    22.76s
   [junit4] Slave 4:     0.20 ..    27.39 =    27.19s
   [junit4] Slave 5:     0.19 ..    27.23 =    27.04s
   [junit4] Slave 6:     0.23 ..    20.40 =    20.17s
   [junit4] Slave 7:     0.19 ..    26.52 =    26.33s
   [junit4] Slave 8:     0.24 ..    26.42 =    26.18s
   [junit4] Slave 9:     0.22 ..    23.57 =    23.35s
   [junit4] Execution time total: 27.52s
   [junit4] Tests summary: 279 suites, 1546 tests, 4 ignored

15 cpus:

   [junit4] Slave 0:     0.29 ..     5.16 =     4.87s
   [junit4] Slave 1:     0.26 ..    15.36 =    15.10s
   [junit4] Slave 2:     0.26 ..    12.99 =    12.73s
   [junit4] Slave 3:     0.29 ..    24.20 =    23.92s
   [junit4] Slave 4:     0.26 ..    27.00 =    26.74s
   [junit4] Slave 5:     0.33 ..    19.97 =    19.63s
   [junit4] Slave 6:     0.31 ..    25.29 =    24.98s
   [junit4] Slave 7:     0.24 ..    28.92 =    28.68s
   [junit4] Slave 8:     0.33 ..    23.67 =    23.34s
   [junit4] Slave 9:     0.43 ..    24.43 =    24.00s
   [junit4] Slave 10:     0.40 ..    27.61 =    27.21s
   [junit4] Slave 11:     0.22 ..    21.77 =    21.56s
   [junit4] Slave 12:     0.22 ..    26.78 =    26.56s
   [junit4] Slave 13:     0.26 ..    25.92 =    25.66s
   [junit4] Slave 14:     0.35 ..    27.77 =    27.42s
   [junit4] Execution time total: 28.98s
   [junit4] Tests summary: 279 suites, 1546 tests, 4 ignored

20 cpus:

   [junit4] Slave 0:     0.35 ..    23.32 =    22.97s
   [junit4] Slave 1:     0.30 ..    24.32 =    24.02s
   [junit4] Slave 2:     0.35 ..    21.35 =    21.00s
   [junit4] Slave 3:     0.37 ..    23.63 =    23.26s
   [junit4] Slave 4:     0.38 ..    20.74 =    20.35s
   [junit4] Slave 5:     0.30 ..    19.74 =    19.44s
   [junit4] Slave 6:     0.36 ..    26.39 =    26.03s
   [junit4] Slave 7:     0.46 ..    23.64 =    23.18s
   [junit4] Slave 8:     0.43 ..    22.44 =    22.02s
   [junit4] Slave 9:     0.30 ..    24.05 =    23.76s
   [junit4] Slave 10:     0.41 ..    24.75 =    24.33s
   [junit4] Slave 11:     0.30 ..    22.66 =    22.36s
   [junit4] Slave 12:     0.30 ..    24.93 =    24.62s
   [junit4] Slave 13:     0.40 ..    24.39 =    24.00s
   [junit4] Slave 14:     0.24 ..    24.47 =    24.23s
   [junit4] Slave 15:     0.45 ..    25.23 =    24.78s
   [junit4] Slave 16:     0.34 ..    23.06 =    22.72s
   [junit4] Slave 17:     0.23 ..    23.50 =    23.28s
   [junit4] Slave 18:     0.30 ..    24.27 =    23.97s
   [junit4] Slave 19:     0.30 ..    24.91 =    24.61s
   [junit4] Execution time total: 26.52s
   [junit4] Tests summary: 279 suites, 1546 tests, 4 ignored

I only ran once each and results are likely noisy... so it's hard to
pick a best CPU count...

>> Does the "Execution time total" include compilation, or is it just the
>> actual test runtime?
>
> The total is calculated before slave VMs are launched and after they
> complete, so even launch time is included. It's here:
> https://github.com/carrotsearch/randomizedtesting/blob/master/ant-junit4/src/main/java/com/carrotsearch/ant/tasks/junit4/JUnit4.java

Hmm so does that include compile time (my numbers don't)?  Sounds like
no?  I'm also measuring from first launch to last finish.

>> Can this change run "across" the different groups of tests we have
>> (core, modules/*, contrib/*, solr/*, etc.)?  I found that to be a
>> major bottleneck in the current "ant test"'s concurrency, ie we have a
>> pinch point after each group of tests (must wait for all JVMs to
>> finish before moving on to next group...), but I think fixing that in
>> ant is going to be hard?
>
> If I understand you correctly the problem is that ANT in Lucene/ Solr
> is calling to sub-module ANT scripts and these in turn invoke the test
> macro. So running everything from a single test task would be possible
> if we had a master-level test script, it's not directly related to how
> the tests are actually executed.

Yes I think that's the problem!

Ideally ant would just gather up all "jobs" to run and then we'd
aggregate/distribute across JVMs.

> That JUnit4 task supports globbing in
> suite selectors so it could be executed with, say,
> -Dtests.class=org.apache.lucene.blah.* to restrict tests to run just a
> certain section of all tests, but include everything by default.

Cool.

> Don't know how it affects modularization though -- the tests will run
> faster but they'll be more difficult to maintain I guess.

Hmm... can we somehow keep today's directory structure but have ant
treat it as a single "module"?  Or is the problem that we need to
change the JVM settings (eg CLASSPATH) per test module we have
today so we must make separate modules for that...?

>> When I use the hacked up Python test runner (runAllTests.py in luceneutil),
>
> This was my inspiration -- Robert pointed me at that, very helpful
> although you need your kind of machine to run so many SSH sessions :D

OK cool :)  Actually it doesn't open any SSH sessions unless you give
it remote machines to use -- for the "local" JVMs it just forks.

>> change (balancing the tests across JVMs).  BUT: that's on current
>> trunk, vs your git clone which is somewhat old by now... so it's an
>> apples/pears comparison ;)
>
> Oh, come on, my fork is only a few days behind! :) I've pulled the
> current trunk and merged. I'd appreciate if you could re-run again,
> this time with, say, 5, 10, 15 and 20 threads. I wonder what the
> speedup/ overhead is. Thanks.

I re-ran above -- looks like the times came down some so the new ant
runner is basically the same as the Python runner (on core tests):
great!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to