Hi!

I have a feeling that PredictionIO is lacking integration tests. TravisCI
is executed only on unit tests residing in the repository. Not only better
tests are important for keeping quality of the project, but also for the
sheer comfort of development. Therefore, I tried to come up with some
simple basis for adding and building tests.

   - Integration tests should be agnostic to environment settings (it
   should not matter whether we use Postgres or HBase)
   - They should be easy to run for developers and the configuration should
   not pollute their working space

I have pushed a sequence of commits to my personal fork and ran travis
builds on them - Diff with upstream
<https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure>

The following changes were introduced:

   - Dedicated Docker image was prepared. This image fetches and prepares
   some possible dependencies for PredictionIO - postgres, hbase, spark,
   elasticsearch.
   Upon container initialization all services are started including spark
   standalone cluster. The best way to start it is to use
   testing/run_docker.sh script, which binds relevant ports, mounts shared
   directories with ivy2 cache and PredictionIO's code repository. More
   importantly it sets up pio's configuration, e.g.:
   $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
   '/pio_host/testing/simple_scenario/run_scenario.sh'
   This command should set metadata repo to PGSQL, event data to HBASE and
   model data to HDFS. The last two arguments are path to repo and a command
   to run from inside the container.
   An important thing to note is that container expects a tar with the
   built distribution to be found in shared /pio_host directory, which is
   later unpacked.
   User can then safely execute all pio ... commands. By default container
   pop up a bash shell if not given any other commands.
   - Currently there is only one simple test added, which is just a copy of
   the steps mentioned in the quickstart tutorial.
   - .travis.yml was modified to run 4 concurrent builds: one for unit
   tests as previously and three integration tests for various combinations of
   services
   env:
     global:
       - PIO_HOME=`pwd`

     matrix:
       - BUILD_TYPE=Unit
       - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
   MODELDATA_REP=PGSQL
       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
   EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
   EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
   Here you can find the build logs: travis logs
   <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
   What is more, to make build times shorter, ivy jars are cached on travis
   now, so that they are included faster in subsequent tests.

The current setup let developers have an easy way to run tests for
different environment settings in a deterministic way, as well as use
travis or other CI tools in more convenient way. What is left to do now is
to prepare a sensible set of different tests written in a concise and
extensible way. I think that ideally we could use python API and add a
small library to it focused strictly on our testing purposes.

Any insights would be invaluable.


Regards,

-- Marcin

Reply via email to