[Proposal] Integration tests

Marcin Ziemiński Fri, 22 Jul 2016 14:53:47 -0700

Hi!

I have a feeling that PredictionIO is lacking integration tests. TravisCI
is executed only on unit tests residing in the repository. Not only better
tests are important for keeping quality of the project, but also for the
sheer comfort of development. Therefore, I tried to come up with some
simple basis for adding and building tests.

- Integration tests should be agnostic to environment settings (it
should not matter whether we use Postgres or HBase)
- They should be easy to run for developers and the configuration should
not pollute their working space

I have pushed a sequence of commits to my personal fork and ran travis
builds on them - Diff with upstream
<https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure>

The following changes were introduced:

- Dedicated Docker image was prepared. This image fetches and prepares
some possible dependencies for PredictionIO - postgres, hbase, spark,
elasticsearch.
Upon container initialization all services are started including spark
standalone cluster. The best way to start it is to use
testing/run_docker.sh script, which binds relevant ports, mounts shared
directories with ivy2 cache and PredictionIO's code repository. More
importantly it sets up pio's configuration, e.g.:
$ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
'/pio_host/testing/simple_scenario/run_scenario.sh'
This command should set metadata repo to PGSQL, event data to HBASE and
model data to HDFS. The last two arguments are path to repo and a command
to run from inside the container.
An important thing to note is that container expects a tar with the
built distribution to be found in shared /pio_host directory, which is
later unpacked.
User can then safely execute all pio ... commands. By default container
pop up a bash shell if not given any other commands.
- Currently there is only one simple test added, which is just a copy of
the steps mentioned in the quickstart tutorial.
- .travis.yml was modified to run 4 concurrent builds: one for unit
tests as previously and three integration tests for various combinations of
services
env:
global:
- PIO_HOME=`pwd`

matrix:
- BUILD_TYPE=Unit
- BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
MODELDATA_REP=PGSQL
- BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
- BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
Here you can find the build logs: travis logs
<https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
What is more, to make build times shorter, ivy jars are cached on travis
now, so that they are included faster in subsequent tests.

The current setup let developers have an easy way to run tests for
different environment settings in a deterministic way, as well as use
travis or other CI tools in more convenient way. What is left to do now is
to prepare a sensible set of different tests written in a concise and
extensible way. I think that ideally we could use python API and add a
small library to it focused strictly on our testing purposes.

Any insights would be invaluable.

Regards,

-- Marcin

[Proposal] Integration tests

Reply via email to