Hey guys, This proposal of adding integration tests is awesome!
Echoing Xusen, I recall Pat suggested we could remove pio template get, so it should be okay to just git clone the template from somewhere. I think Marcin is including the template now because currently templates still use artifacts under the old io.prediction package namespace. Regards, Donald On Friday, July 22, 2016, Xusen Yin <[email protected]> wrote: > Hi Marcin, > > Personally I vote for adding integration tests. Thanks for the proposal. > One suggestion is about the test scenarios. IMHO there is no need to add > the recommendation-engin template inside the predictionio codebase. Why not > use pio template to download them from Github when testing? > > Another concern is the docker pull ziemin/pio-testing in the travis config > file. And there is also a testing/Dockerfile which starts from ubuntu. So I > think either we should use docker pull ubuntu, or we use a pre-built > testing Docker image instead of the Dockerfile. > > Best > Xusen Yin > > > On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected] > <javascript:;>> wrote: > > > > Hi! > > > > I have a feeling that PredictionIO is lacking integration tests. TravisCI > > is executed only on unit tests residing in the repository. Not only > better > > tests are important for keeping quality of the project, but also for the > > sheer comfort of development. Therefore, I tried to come up with some > > simple basis for adding and building tests. > > > > - Integration tests should be agnostic to environment settings (it > > should not matter whether we use Postgres or HBase) > > - They should be easy to run for developers and the configuration > should > > not pollute their working space > > > > I have pushed a sequence of commits to my personal fork and ran travis > > builds on them - Diff with upstream > > < > https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure > > > > > > The following changes were introduced: > > > > - Dedicated Docker image was prepared. This image fetches and prepares > > some possible dependencies for PredictionIO - postgres, hbase, spark, > > elasticsearch. > > Upon container initialization all services are started including spark > > standalone cluster. The best way to start it is to use > > testing/run_docker.sh script, which binds relevant ports, mounts shared > > directories with ivy2 cache and PredictionIO's code repository. More > > importantly it sets up pio's configuration, e.g.: > > $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio > > '/pio_host/testing/simple_scenario/run_scenario.sh' > > This command should set metadata repo to PGSQL, event data to HBASE and > > model data to HDFS. The last two arguments are path to repo and a > command > > to run from inside the container. > > An important thing to note is that container expects a tar with the > > built distribution to be found in shared /pio_host directory, which is > > later unpacked. > > User can then safely execute all pio ... commands. By default container > > pop up a bash shell if not given any other commands. > > - Currently there is only one simple test added, which is just a copy > of > > the steps mentioned in the quickstart tutorial. > > - .travis.yml was modified to run 4 concurrent builds: one for unit > > tests as previously and three integration tests for various > combinations of > > services > > env: > > global: > > - PIO_HOME=`pwd` > > > > matrix: > > - BUILD_TYPE=Unit > > - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL > > MODELDATA_REP=PGSQL > > - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH > > EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS > > - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH > > EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS > > Here you can find the build logs: travis logs > > <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806> > > What is more, to make build times shorter, ivy jars are cached on > travis > > now, so that they are included faster in subsequent tests. > > > > The current setup let developers have an easy way to run tests for > > different environment settings in a deterministic way, as well as use > > travis or other CI tools in more convenient way. What is left to do now > is > > to prepare a sensible set of different tests written in a concise and > > extensible way. I think that ideally we could use python API and add a > > small library to it focused strictly on our testing purposes. > > > > Any insights would be invaluable. > > > > > > Regards, > > > > -- Marcin > >
