Re: Some Thoughts on IO Integration Tests

Lukasz Cwik Tue, 10 Jan 2017 16:24:28 -0800

PipelineOptions allows you to aggregate common options within a parent
interface with child interfaces extending and inheriting those options.


For example you can have MyIOOptionsTest1 which extends MyIOOptions and
also MyIOOptionsTest2 which extends MyIOOptions. MyIOOptions can have all
the shared fields such as port or host or whatever and then
MyIOOptionsTest1 can have specific parameters for Test1 and
MyIOOptionsTest2 can have specific parameters for Test2.


On Tue, Jan 10, 2017 at 4:11 PM, Stephen Sisk <s...@google.com.invalid>
wrote:

> thanks Jason for doing this research! There are a lot of options.
>
> You mentioned "Steven" - I assume you're talking about me :)
>
> As you mention in your goals, we want this to work for both individual devs
> working on a particular store, as well as the beam CI system. I'm inclined
> to favor making this pretty easy for individual devs, as long as it doesn't
> come at the cost of making CI config painful. Given that, I suspect that
> we'd want something that's straightforward to discover how to configure,
> that there's an obvious error message if it's misconfigured, and is easy to
> learn.
>
> In that world, I would learn towards options 2 & 3. I think a file that has
> to be maintained/configured separately outside of the repository isn't the
> most developer friendly option. That may be somewhat of a workflow thing.
> And if it was super-important to keep the passwords secret, then a file
> outside of the repository might be the only option that would work -
> however, I think we can simplify the problem and secure our test servers by
> keeping them on a private network. (I don't think we are an important
> target, so I'm not too worried about defense in depth)
>
> Between options 2 & 3 I would favor option 2 - I think having defaults
> pre-set to work in the beam CI environment will lead to a lot of potential
> confusion on the part of developers, since it would let them run the test,
> and then only give them errors about connecting to a server. I think it'd
> be better to get errors about not having pipeline options set, or errors
> about obviously wrong/default pipeline option values, rather than "correct,
> but not for your environment" settings. For example, if we have valid
> defaults, and a user only sets some of the values (ie, sets IP, but not
> port), it'd be trickier to detect when the value being used is valid. I'd
> want the test to be able to show users an error that the option wasn't set,
> rather than have it try to connect and timeout b/c the port isn't correct.
>
> I also don't think that having per-IT options is a good idea. If 2
> different ITs can share a data store, we should be able to pass them the
> same config - that also makes me strongly favor an option where we have a
> shared set of options. Do we *have* to make it part of the
> TestPipelineOptions? Can we make something like a BeamIOTestPipelineOption?
>
> This generally leads me to favor option 2.
>
> S
>
> On Tue, Jan 10, 2017 at 3:39 PM Jason Kuster <jasonkus...@google.com.
> invalid>
> wrote:
>
> > Hi all,
> >
> > Following up on some of the discussions already on-list, I wanted to
> > solicit some more feedback about some implementation details regarding
> the
> > IO Integration Tests.
> >
> > As it currently stands, we mostly have IO ITs for GCP-based IO, which our
> > GCP-based Jenkins executors handle natively, but as our integration test
> > coverage begins to expand, we're going to run into several of the
> problems
> > relevant to what Steven is doing with hosting data stores for use by
> ITs. I
> > wanted to get people's feedback about how to handle passing credentials
> to
> > the ITs. We have a couple options, motivated by some goals.
> >
> > Goals:
> >
> > * Has to work in Apache Beam CI environment.
> > * Has to run on dev machines (w/o access to beam CI environment).
> > * One way of passing datastore config only.
> > * An individual IT fails fast if run and it doesn't have valid config.
> > * IO performance tests will have a validation component (this implies we
> > need to run the IO ITs, not just the IO IT pipelines).
> > * Devs working on an individual IO transform can run Integration & perf
> > tests without recreating the data stores every time
> > * Devs working on a runner's IO can run all the IO integration & perf
> > tests. They may have to recreate the data stores every time (or possibly
> > have a manual config that helps with this.) It's okay if this world is a
> > bit awkward, it just needs to be possible.
> >
> >
> > Option 1: IO Configuration File
> >
> > The first option is to read all credentials from some file stored on
> disk.
> > We can define a location for an (xml, json, yaml, etc) file which we can
> > read in each IT to find the credentials that IT needs. This method has a
> > couple of upsides and a couple of downsides.
> >
> > * Upsides
> >     * Passing credentials to ITs, and adding new credentials, is
> relatively
> > easy.
> >     * Individual users can spin up their own data stores, put the
> > credentials in the file, run their ITs and have things just work.
> > * Downsides
> >     * Relying on a file, especially a file not checked in to the
> repository
> > (to prevent people from accidentally sharing credentials to their data
> > store, etc) is fragile and can lead to some confusing failure cases.
> >     * ITs are supposed to be self-contained; using a file on disk makes
> > things like running them in CI harder.
> >     * It seems like datastore location, username, and password are things
> > that are a better fit for the IT PipelineOptions anyway.
> >
> >
> > Option 2: TestPipelineOptions
> >
> > Another option is to specify them as general PipelineOptions on
> > TestPipelineOptions and then to build the specific IT's options from
> there.
> > For example, say we have MyStoreIT1, MyStoreIT2 and MyStoreIT3. We could
> > specify inside of TestPipelineOptions some options like "MyStoreIP",
> > "MyStoreUsername", and "MyStorePassword", and then the command for
> invoking
> > them would look like (omitting some irrelevant things):
> >
> > mvn clean verify -DskipITs=false -DbeamTestPipelineOptions='[...,
> > "--MyStoreIP=1.2.3.4", "--MyStoreUsername=beamuser",
> > "--MyStorePassword=hunter2"]'.
> > * Upsides
> >     * Test is self-contained -- no dependency on an external file and all
> > relevant things can be specified on the command line; easier for users
> and
> > CI.
> >     * Passing credentials to ITs via pipelineoptions feels better.
> > * Downsides
> >     * Harder to pass different credentials to one specific IT; e.g. I
> want
> > MyStoreIT1 and 2 to run against 1.2.3.4, but MyStoreIT3 to run against
> > 9.8.7.6.
> >     * Investing in this pattern means a proliferation of
> > TestPipelineOptions. Potentially bad, especially for a CI suite running a
> > large number of ITs -- size of command line args may become unmanageable
> > with 5+ data stores.
> >
> >
> > Option 3: Individual IT Options
> >
> > The last option I can think of is to specify the options directly on the
> > IT's options, e.g. MyStoreIT1Options, and set defaults which work well
> for
> > CI. This means that CI could run an entire suite of ITs without
> specifying
> > any arguments and trusting that the ITs' defaults will work, but means an
> > individual developer is potentially able to run only one IT at a time,
> > since it will be impossible to override all options from the command
> line.
> > * Upsides
> >     * Test is still self-contained, and even more so -- possible to
> specify
> > args targeted at one IT in particular.
> >     * Args are specified right where they're used; way smaller chance of
> > confusion or mistakes.
> >     * Easiest for CI -- as long as defaults for data store auth and
> > location are correct from the perspective of the Jenkins executor, it can
> > essentially just turn all ITs on and run them as is.
> > * Downsides
> >     * Hardest for individual developers to run an entire suite of ITs --
> > since defaults are configured for running in CI environment, they will
> > likely fail when running on the user's machine, resulting in annoyance
> for
> > the user.
> >
> >
> > If anyone has thoughts on these, please let me know.
> >
> > Best,
> >
> > Jason
> >
> > --
> > -------
> > Jason Kuster
> > Apache Beam (Incubating) / Google Cloud Dataflow
> >
>

Re: Some Thoughts on IO Integration Tests

Reply via email to