[ 
https://issues.apache.org/jira/browse/HBASE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038062#comment-17038062
 ] 

Mark Robert Miller commented on HBASE-23779:
--------------------------------------------

I see what's going on here. A lot ;)

To some degree Maven is not helping - the equivalent approximation to Gradles 
awesome parallel build performance can be a fair bit more expensive at the 
least. That's just half the equation though. Its really largely small-med or 
potentially small-med tests masquerading as large or super large tests and/or 
waiting for a non CI, less intense option. Expanding limits and trying to baby 
and isolate the tests has gotten HBase to like a billion tests, which I am both 
impressed and frustrated with. So. Many. Test classes. Just tossing that many 
no op tests at an executor is going to take some time - toss a new 2800Mb JVM 
in between for most of them and it will take a little more :)

We could address it all within a few months time (the basics anyway, really so 
much that can be done and improved on the basics). I'll convince someone to 
champion a reverse course and shrink resources and expose the tests to a bit of 
hell on purpose for profit and pleasure. There is a lot of hidden flakiness, 
which is a valid strategy in these cases - if you can hide it well enough, at 
least it's a mostly rational signal. But with so many tests and hours running 
times, any real stability will be forever elusive, a mirage, or hanging on a 
dime. You also just pay for it in so many ways, even if it does end up with 
some success.

We can expose these tests to sunlight and it will force us to shape them right 
up.

 

> Up the default fork count to make builds complete faster; make count relative 
> to CPU count
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-23779
>                 URL: https://issues.apache.org/jira/browse/HBASE-23779
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: addendum2.patch, test_yetus_934.0.patch
>
>
> Tests take a long time. Our fork count running all tests are conservative -- 
> 1 (small) for first part and 5 for second part (medium and large). Rather 
> than hardcoding we should set the fork count to be relative to machine size. 
> Suggestion here is 0.75C where C is CPU count. This ups the CPU use on my box.
> Looking up at jenkins, it seems like the boxes are 24 cores... at least going 
> by my random survey. The load reported on a few seems low though this not 
> representative (looking at machine/uptime).
> More parallelism willl probably mean more test failure. Let me take a look 
> see.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to