Hi! I have a feeling that PredictionIO is lacking integration tests. TravisCI is executed only on unit tests residing in the repository. Not only better tests are important for keeping quality of the project, but also for the sheer comfort of development. Therefore, I tried to come up with some simple basis for adding and building tests.
- Integration tests should be agnostic to environment settings (it should not matter whether we use Postgres or HBase) - They should be easy to run for developers and the configuration should not pollute their working space I have pushed a sequence of commits to my personal fork and ran travis builds on them - Diff with upstream <https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure> The following changes were introduced: - Dedicated Docker image was prepared. This image fetches and prepares some possible dependencies for PredictionIO - postgres, hbase, spark, elasticsearch. Upon container initialization all services are started including spark standalone cluster. The best way to start it is to use testing/run_docker.sh script, which binds relevant ports, mounts shared directories with ivy2 cache and PredictionIO's code repository. More importantly it sets up pio's configuration, e.g.: $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio '/pio_host/testing/simple_scenario/run_scenario.sh' This command should set metadata repo to PGSQL, event data to HBASE and model data to HDFS. The last two arguments are path to repo and a command to run from inside the container. An important thing to note is that container expects a tar with the built distribution to be found in shared /pio_host directory, which is later unpacked. User can then safely execute all pio ... commands. By default container pop up a bash shell if not given any other commands. - Currently there is only one simple test added, which is just a copy of the steps mentioned in the quickstart tutorial. - .travis.yml was modified to run 4 concurrent builds: one for unit tests as previously and three integration tests for various combinations of services env: global: - PIO_HOME=`pwd` matrix: - BUILD_TYPE=Unit - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL MODELDATA_REP=PGSQL - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS Here you can find the build logs: travis logs <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806> What is more, to make build times shorter, ivy jars are cached on travis now, so that they are included faster in subsequent tests. The current setup let developers have an easy way to run tests for different environment settings in a deterministic way, as well as use travis or other CI tools in more convenient way. What is left to do now is to prepare a sensible set of different tests written in a concise and extensible way. I think that ideally we could use python API and add a small library to it focused strictly on our testing purposes. Any insights would be invaluable. Regards, -- Marcin
