[
https://issues.apache.org/jira/browse/HBASE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038062#comment-17038062
]
Mark Robert Miller commented on HBASE-23779:
--------------------------------------------
I see what's going on here. A lot ;)
To some degree Maven is not helping - the equivalent approximation to Gradles
awesome parallel build performance can be a fair bit more expensive at the
least. That's just half the equation though. Its really largely small-med or
potentially small-med tests masquerading as large or super large tests and/or
waiting for a non CI, less intense option. Expanding limits and trying to baby
and isolate the tests has gotten HBase to like a billion tests, which I am both
impressed and frustrated with. So. Many. Test classes. Just tossing that many
no op tests at an executor is going to take some time - toss a new 2800Mb JVM
in between for most of them and it will take a little more :)
We could address it all within a few months time (the basics anyway, really so
much that can be done and improved on the basics). I'll convince someone to
champion a reverse course and shrink resources and expose the tests to a bit of
hell on purpose for profit and pleasure. There is a lot of hidden flakiness,
which is a valid strategy in these cases - if you can hide it well enough, at
least it's a mostly rational signal. But with so many tests and hours running
times, any real stability will be forever elusive, a mirage, or hanging on a
dime. You also just pay for it in so many ways, even if it does end up with
some success.
We can expose these tests to sunlight and it will force us to shape them right
up.
> Up the default fork count to make builds complete faster; make count relative
> to CPU count
> ------------------------------------------------------------------------------------------
>
> Key: HBASE-23779
> URL: https://issues.apache.org/jira/browse/HBASE-23779
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Michael Stack
> Assignee: Michael Stack
> Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: addendum2.patch, test_yetus_934.0.patch
>
>
> Tests take a long time. Our fork count running all tests are conservative --
> 1 (small) for first part and 5 for second part (medium and large). Rather
> than hardcoding we should set the fork count to be relative to machine size.
> Suggestion here is 0.75C where C is CPU count. This ups the CPU use on my box.
> Looking up at jenkins, it seems like the boxes are 24 cores... at least going
> by my random survey. The load reported on a few seems low though this not
> representative (looking at machine/uptime).
> More parallelism willl probably mean more test failure. Let me take a look
> see.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)