+1 for integration tests. We have one for the UR that tests PIO too and can be 
a starter. It is probably possible to create one that is template independent 
that only tests the EventServer or some null template. 

https://github.com/actionml/template-scala-parallel-universal-recommendation/blob/master/examples/integration-test

By looking at all the scripts called you can see how we employ example code in 
integrations tests, we could try to encourage this as a pattern for new 
templates and get great benefits.

On Jul 22, 2016, at 8:28 PM, Marcin Ziemiński <[email protected]> wrote:

I agree that engine templates should be downloaded separately by tests, but
as Donald mentioned, I included an engine and modified it so that it could
be built after the recent namespaces modifications. Therefore, engine
templates should also be updated.

As far as the Docker is concerned I might have not been clear enough. The
Dockerfile you can find in the repository was used to build the image being
pulled on travis. It is called ziemin/pio-testing, because this is
currently a draft I pushed to my repository on docker hub. Moreover, this
is an essence of this proposal - it gives you a way to run tests in
deterministic way in prepared environment. Besides it makes travis builds
faster - instead of downloading all dependencies and installing them every
time, you just fetch  one pre-built package working out of the box.

pt., 22.07.2016 o 16:37 użytkownik Simon Chan <[email protected]>
napisał:

> Integration tests is a great idea.
> 
> Yea I guess going forward a direct git to download templates may be a
> viable option.
> 
> Simon
> 
> On Friday, July 22, 2016, Donald Szeto <[email protected]> wrote:
> 
>> Hey guys,
>> 
>> This proposal of adding integration tests is awesome!
>> 
>> Echoing Xusen, I recall Pat suggested we could remove pio template get,
> so
>> it should be okay to just git clone the template from somewhere. I think
>> Marcin is including the template now because currently templates still
> use
>> artifacts under the old io.prediction package namespace.
>> 
>> Regards,
>> Donald
>> 
>> On Friday, July 22, 2016, Xusen Yin <[email protected] <javascript:;>>
>> wrote:
>> 
>>> Hi Marcin,
>>> 
>>> Personally I vote for adding integration tests. Thanks for the
> proposal.
>>> One suggestion is about the test scenarios. IMHO there is no need to
> add
>>> the recommendation-engin template inside the predictionio codebase. Why
>> not
>>> use pio template to download them from Github when testing?
>>> 
>>> Another concern is the docker pull ziemin/pio-testing in the travis
>> config
>>> file. And there is also a testing/Dockerfile which starts from ubuntu.
>> So I
>>> think either we should use docker pull ubuntu, or we use a pre-built
>>> testing Docker image instead of the Dockerfile.
>>> 
>>> Best
>>> Xusen Yin
>>> 
>>>> On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected]
>> <javascript:;>
>>> <javascript:;>> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> I have a feeling that PredictionIO is lacking integration tests.
>> TravisCI
>>>> is executed only on unit tests residing in the repository. Not only
>>> better
>>>> tests are important for keeping quality of the project, but also for
>> the
>>>> sheer comfort of development. Therefore, I tried to come up with some
>>>> simple basis for adding and building tests.
>>>> 
>>>>  - Integration tests should be agnostic to environment settings (it
>>>>  should not matter whether we use Postgres or HBase)
>>>>  - They should be easy to run for developers and the configuration
>>> should
>>>>  not pollute their working space
>>>> 
>>>> I have pushed a sequence of commits to my personal fork and ran
> travis
>>>> builds on them - Diff with upstream
>>>> <
>>> 
>> 
> https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure
>>>> 
>>>> 
>>>> The following changes were introduced:
>>>> 
>>>>  - Dedicated Docker image was prepared. This image fetches and
>> prepares
>>>>  some possible dependencies for PredictionIO - postgres, hbase,
> spark,
>>>>  elasticsearch.
>>>>  Upon container initialization all services are started including
>> spark
>>>>  standalone cluster. The best way to start it is to use
>>>>  testing/run_docker.sh script, which binds relevant ports, mounts
>> shared
>>>>  directories with ivy2 cache and PredictionIO's code repository.
> More
>>>>  importantly it sets up pio's configuration, e.g.:
>>>>  $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
>>>>  '/pio_host/testing/simple_scenario/run_scenario.sh'
>>>>  This command should set metadata repo to PGSQL, event data to HBASE
>> and
>>>>  model data to HDFS. The last two arguments are path to repo and a
>>> command
>>>>  to run from inside the container.
>>>>  An important thing to note is that container expects a tar with the
>>>>  built distribution to be found in shared /pio_host directory, which
>> is
>>>>  later unpacked.
>>>>  User can then safely execute all pio ... commands. By default
>> container
>>>>  pop up a bash shell if not given any other commands.
>>>>  - Currently there is only one simple test added, which is just a
> copy
>>> of
>>>>  the steps mentioned in the quickstart tutorial.
>>>>  - .travis.yml was modified to run 4 concurrent builds: one for unit
>>>>  tests as previously and three integration tests for various
>>> combinations of
>>>>  services
>>>>  env:
>>>>    global:
>>>>      - PIO_HOME=`pwd`
>>>> 
>>>>    matrix:
>>>>      - BUILD_TYPE=Unit
>>>>      - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
>>>>  MODELDATA_REP=PGSQL
>>>>      - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>>>>  EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
>>>>      - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>>>>  EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
>>>>  Here you can find the build logs: travis logs
>>>>  <
>> https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
>>>>  What is more, to make build times shorter, ivy jars are cached on
>>> travis
>>>>  now, so that they are included faster in subsequent tests.
>>>> 
>>>> The current setup let developers have an easy way to run tests for
>>>> different environment settings in a deterministic way, as well as use
>>>> travis or other CI tools in more convenient way. What is left to do
> now
>>> is
>>>> to prepare a sensible set of different tests written in a concise and
>>>> extensible way. I think that ideally we could use python API and add
> a
>>>> small library to it focused strictly on our testing purposes.
>>>> 
>>>> Any insights would be invaluable.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> -- Marcin
>>> 
>>> 
>> 
> 

Reply via email to