Re: [DISCUSS] Build Times are getting out of hand

JJ Meyer Tue, 07 Feb 2017 07:45:18 -0800

Mike, unfortunately something changed recently, and I can't run `mvn clean
install -T 2C` locally anymore.


I'd like to echo that I think working on fixing the dependency issue is a
very good idea. We've actually faced issues with this on the REST API PR.
Working to fix this and having a standard way of including/excluding
dependencies will be helpful to all, and to Ryan's point will benefit us
outside of this context.

On Tue, Feb 7, 2017 at 9:36 AM, Ryan Merriman <[email protected]> wrote:

> Debugging integration tests in an IDE uses the same approach with our
> current infrastructure or with docker:  start up the topology with
> LocalRunner.  I've had mixed success with our current infrastructure.  As
> Mike alluded to, some tests work fine (most of the parser topologies and
> enrichment topology) while others fail when run in my IDE but work on the
> command line (ES integration test due to guava issues and Squid topology
> due to some issue with the remove subdomains Stellar function).  Of course
> with Docker infrastructure you will need a test runner to launch topologies
> in LocalRunner.  They are short and simple though and I have one written
> for each topology that I can share when appropriate.
>
> There are some advantages and disadvantages to switching the integration
> tests to use Docker.  The infrastructure we have now works and could be
> adjusted to overcome it's primary weaknesses (single classloader and start
> up/shutdown after each test).  With Docker the classloader issue goes away
> for the most part (or is much better than it is now) without any extra
> work.  For spinning services up/down once instead of with each test, we
> will need to adjust our tests to clean up after themselves or (even better)
> namespace all testing objects so that tests don't step on each other.  That
> work would have to be done no matter which infrastructure approach we
> take.  Probably the biggest downside to using Docker is that all
> integration tests will need to be adjusted and we'll likely hit some issues
> that we'll need to resolve.  I was bitten several times by services that
> broadcast their host address (Kafka for example) and I bet we hit more of
> those.  We'll also need to add a few more containers (HDFS for sure) but
> those are easy to create as long as you don't hit the issue I just
> mentioned.
>
> I think all of the suggestions so far are good ideas.  I think it goes
> without saying that we should do one at a time and maybe even reassess
> after we see the impact of each change.  I would vote for doing the
> Maven/shading one first because it is all around beneficial, even outside
> of this context.
>
> On Tue, Feb 7, 2017 at 9:04 AM, Casey Stella <[email protected]> wrote:
>
> > I believe that some people use travis and some people request Jenkins
> from
> > Apache Infra.  That being said, personally, I think we should take the
> > opportunity to correct the underlying issues.  50 minutes for a build
> seems
> > excessive to me.
> >
> > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <[email protected]>
> > wrote:
> >
> > > Is there an alternative to Travis?  Do other like sized apache projects
> > > have these problems?  Do they use travis?
> > >
> > >
> > > On February 6, 2017 at 17:02:37, Casey Stella ([email protected])
> > wrote:
> > >
> > > For those with pending/building pull requests, it will come as no
> > surprise
> > > that our build times are increasing at a pace that is worrisome. In
> fact,
> > > we have hit a fundamental limit associated with Travis over the
> weekend.
> > > We have creeped up into the 40+ minute build territory and travis seems
> > to
> > > error out at around 49 minutes.
> > >
> > > Taking the current build (
> > > https://travis-ci.org/apache/incubator-metron/jobs/198929446), looking
> > at
> > > just job times, we're spending about 19 - 20 minutes (1176.53 seconds)
> in
> > > tests out of 44 minutes and 42 seconds to do the build. This places the
> > > unit tests at around 43% of the build time. I say all of this to point
> > out
> > > that while unit tests are a portion of the build, they are not even the
> > > majority of the build time. We need an approach that addresses the
> whole
> > > build performance holistically and we need it soonest.
> > >
> > > To seed the discussion, I will point to a few things that come to mind
> > > that
> > > fit into three broad categories:
> > >
> > > *Tests are Slow*
> > >
> > >
> > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> and
> > > make up 14 minutes of the build. Considering what we can do to speed
> > those
> > > tests as a tactical approach may be worth considering
> > > - We are spinning up the same services (e.g. kafka, storm) for multiple
> > > tests, instead use the docker infrastructure to spin them up once and
> > then
> > > use them throughout the tests.
> > >
> > >
> > > *Tests aren't parallel*
> > >
> > > Currently we cannot run the build in parallel due to the integration
> test
> > > infrastructure spinning up its own services that bind to the same
> ports.
> > > If we correct this, we can run the builds in parallel with mvn -T
> > >
> > > - Correct this by decoupling the infrastructure from the tests and
> > > refactoring the tests to run in parallel.
> > > - Make the integration testing infrastructure bind intelligently to
> > > whatever port is available.
> > > - Move the integration tests to their own project. This will let us run
> > > the build in parallel since an individual project's test will be run
> > > serially.
> > >
> > > *Packaging is Painful*
> > >
> > > We have a sensitive environment in terms of dependencies. As such, we
> are
> > > careful to shade and relocate dependencies that we want to isolate from
> > > our
> > > transitive dependencies. The consequences of this is that we spend a
> lot
> > > of time in the build shading and relocating maven module output.
> > >
> > > - Do the hard work to walk our transitive dependencies and ensure that
> > > we are including only one copy of every library by using exclusions
> > > effectively. This will not only bring down build times, it will make
> sure
> > > we know what we're including.
> > > - Try to devise a strategy where we only shade once at the end. This
> > > could look like some combination of
> > > - standardizing on the lowest common denominator of a troublesome
> > > library
> > > - We shade in dependencies so they can use different versions of
> > > libraries (e.g. metron-common with a modern version of guava) than the
> > > final jars.
> > > - exclusions
> > > - externalizing infrastructure out to not necessitate spinning up
> > > hadoop components in-process for integration tests (i.e. hbase server
> > > conflicts with storm in a few dependencies)
> > >
> > > *Final Thoughts*
> > >
> > > If I had three to pick, I'd pick
> > >
> > > - moving off of the in-memory component infrastructure to docker images
> > > - fixing the maven poms to exclude correctly
> > > - ensuring the resulting tests are parallelizable
> > >
> > > I will point out that fixing the maven poms to exclude correctly (i.e.
> we
> > > choose the version of every jar that we depend on transitively) ticks
> > > multiple boxes, not just making things faster.
> > >
> > > What are your thoughts? What did I miss? We need a plan and we need to
> > > execute on it soon, otherwise travis is going to keep smacking us hard.
> > It
> > > may be worth while constructing a tactical plan and then a more
> strategic
> > > plan that we can work toward. I was heartened at how much some of these
> > > suggestions dovetail with the discussion around the future of the
> docker
> > > infrastructure.
> > >
> > > Best,
> > >
> > > Casey
> > >
> > >
> >
>

Re: [DISCUSS] Build Times are getting out of hand

Reply via email to