ah! I looked around a bit more and found the dcos package repo - https://github.com/mesosphere/universe/tree/version-3.x/repo/packages
poking around a bit, I can find a lot of packages for single node instances, but not many packages for multi-node instances. Single node instance packages are kind of useful, but I don't think it's *too* helpful. The multi-node instance packages that run the data store's high availability mode are where the real work is, and it seems like both kubernetes helm and dcos' package universe don't have a lot of those. S On Wed, Jan 18, 2017 at 9:56 AM Stephen Sisk <[email protected]> wrote: > Hi Ishmael, > > these are good questions, thanks for raising them. > > Ability to modify network/compute resources to simulate failures > ================================================= > I see two real questions here: > 1. Is this something we want to do? > 2. Is it possible with both/either? > > So far, the test strategy I've been advocating is that we test problems > like this in unit tests rather than do this in ITs/Perf tests. Otherwise, > it's hard to re-create the same conditions. > > I can investigate whether it's possible, but I want to clarify whether > this is something that we care about. I know both support killing > individual nodes. I haven't seen a lot of network control in either, but > haven't tried to look for it. > > Availability of ready to play packages > ============================ > I did look at this, and as far as I could tell, mesos didn't have any > pre-built packages for multi-node clusters of data stores. If there's a > good repository of them that we trust, that would definitely save us time. > Can you point me at the mesos repository? > > S > > > > On Wed, Jan 18, 2017 at 8:37 AM Jean-Baptiste Onofré <[email protected]> > wrote: > > Hi Ismael > > Stephen will reply with details but I know he did a comparison and > evaluate different options. > > He tested with the jdbc Io itests. > > Regards > JB > > On Jan 18, 2017, 08:26, at 08:26, "Ismaël Mejía" <[email protected]> > wrote: > >Thanks for your analysis Stephen, good arguments / references. > > > >One quick question. Have you checked the APIs of both > >(Mesos/Kubernetes) to > >see > >if we can do programmatically do more complex tests (I suppose so, but > >you > >don't mention how easy or if those are possible), for example to > >simulate a > >slow networking slave (to test stragglers), or to arbitrarily kill one > >slave (e.g. if I want to test the correct behavior of a runner/IO that > >is > >reading from it) ? > > > >Other missing point in the review is the availability of ready to play > >packages, > >I think in this area mesos/dcos seems more advanced no? I haven't > >looked > >recently but at least 6 months ago there were not many helm packages > >ready > >for > >example to test kafka or the hadoop echosystem stuff (hdfs, hbase, > >etc). Has > >this been improved ? because preparing this also is a considerable > >amount of > >work on the other hand this could be also a chance to contribute to > >kubernetes. > > > >Regards, > >Ismaël > > > > > > > >On Wed, Jan 18, 2017 at 2:36 AM, Stephen Sisk <[email protected]> > >wrote: > > > >> hi! > >> > >> I've been continuing this investigation, and have some more info to > >report, > >> and hopefully we can start making some decisions. > >> > >> To support performance testing, I've been investigating > >mesos+marathon and > >> kubernetes for running data stores in their high availability mode. I > >have > >> been examining features that kubernetes/mesos+marathon use to support > >this. > >> > >> Setting up a multi-node cluster in a high availability mode tends to > >be > >> more expensive time-wise than the single node instances I've played > >around > >> with in the past. Rather than do a full build out with both > >kubernetes and > >> mesos, I'd like to pick one of the two options to build the prototype > >> cluster with. If the prototype doesn't go well, we could still go > >back to > >> the other option, but I'd like to change us from a mode of "let's > >look at > >> all the options" to one of "here's the favorite, let's prove that > >works for > >> us". > >> > >> Below are the features that I've seen are important to multi-node > >instances > >> of data stores. I'm sure other folks on the list have done this > >before, so > >> feel free to pipe up if I'm missing a good solution to a problem. > >> > >> DNS/Discovery > >> > >> -------------------- > >> > >> Necessary for talking between nodes (eg, cassandra nodes all need to > >be > >> able to talk to a set of seed nodes.) > >> > >> * Kubernetes has built-in DNS/discovery between nodes. > >> > >> * Mesos has supports this via mesos-dns, which isn't a part of core > >mesos, > >> but is in dcos, which is the mesos distribution I've been using and > >that I > >> would expect us to use. > >> > >> Instances properly distributed across nodes > >> > >> ------------------------------------------------------------ > >> > >> If multiple instances of a data source end up on the same underlying > >VM, we > >> may not get good performance out of those instances since the > >underlying VM > >> may be more taxed than other VMs. > >> > >> * Kubernetes has a beta feature StatefulSets[1] which allow for > >containers > >> distributed so that there's one container per underlying machine (as > >well > >> as a lot of other useful features like easy stable dns names.) > >> > >> * Mesos can support this via the built in UNIQUE constraint [2] > >> > >> Load balancing > >> > >> -------------------- > >> > >> Incoming requests from users need to be distributed to the various > >machines > >> - this is important for many data stores' high availability modes. > >> > >> * Kubernetes supports easily hooking up to an external load balancer > >when > >> on a cloud (and can be configured to work with a built-in load > >balancer if > >> not) > >> > >> * Mesos supports this via marathon-lb [3], which is an install-able > >package > >> in DC/OS > >> > >> Persistent Volumes tied to specific instances > >> > >> ------------------------------------------------------------ > >> > >> Databases often need persistent state (for example to store the data > >:), so > >> it's an important part of running our service. > >> > >> * Kubernetes StatefulSets supports this > >> > >> * Mesos+marathon apps with persistent volumes supports this [4] [5] > >> > >> As I mentioned above, I'd like to focus on either kubernetes or mesos > >for > >> my investigation, and as I go further along, I'm seeing kubernetes as > >> better suited to our needs. > >> > >> (1) It supports more of the features we want out of the box and with > >> StatefulSets, Kubernetes handles them all together neatly - eg. DC/OS > >> requires marathon-lb to be installed and mesos-dns to be configured. > >> > >> (2) I'm also finding that there seem to be more examples of using > >> kubernetes to solve the types of problems we're working on. This is > >> somewhat subjective, but in my experience as I've tried to learn both > >> kubernetes and mesos, I personally found it generally easier to get > >> kubernetes running than mesos due to the tutorials/examples available > >for > >> kubernetes. > >> > >> (3) Lower cost of initial setup - as I discussed in a previous > >mail[6], > >> kubernetes was far easier to get set up even when I knew the exact > >steps. > >> Mesos took me around 27 steps [7], which involved a lot of config > >that was > >> easy to get wrong (it took me about 5 tries to get all the steps > >correct in > >> one go.) Kubernetes took me around 8 steps and very little config. > >> > >> Given that, I'd like to focus my investigation/prototyping on > >Kubernetes. > >> To > >> be clear, it's fairly close and I think both Mesos and Kubernetes > >could > >> support what we need, so if we run into issues with kubernetes, Mesos > >still > >> seems like a viable option that we could fall back to. > >> > >> Thanks, > >> Stephen > >> > >> > >> [1] Kubernetes StatefulSets > >> > > > https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/ > >> > >> [2] mesos unique constraint - > >> https://mesosphere.github.io/marathon/docs/constraints.html > >> > >> [3] > >> https://mesosphere.github.io/marathon/docs/service- > >> discovery-load-balancing.html > >> and https://mesosphere.com/blog/2015/12/04/dcos-marathon-lb/ > >> > >> [4] > >https://mesosphere.github.io/marathon/docs/persistent-volumes.html > >> > >> [5] > >https://dcos.io/docs/1.7/usage/tutorials/marathon/stateful-services/ > >> > >> [6] Container Orchestration software for hosting data stores > >> https://lists.apache.org/thread.html/5825b35b895839d0b33b6c726c1de0 > >> e76bdb9653d1e913b1207c6c4d@%3Cdev.beam.apache.org%3E > >> > >> [7] https://github.com/ssisk/beam/blob/support/support/mesos/setup.md > >> > >> > >> On Thu, Dec 29, 2016 at 5:44 PM Davor Bonaci <[email protected]> > >wrote: > >> > >> > Just a quick drive-by comment: how tests are laid out has > >non-trivial > >> > tradeoffs on how/where continuous integration runs, and how results > >are > >> > integrated into the tooling. The current state is certainly not > >ideal > >> > (e.g., due to multiple test executions some links in Jenkins point > >where > >> > they shouldn't), but most other alternatives had even bigger > >drawbacks at > >> > the time. If someone has great ideas that don't explode the number > >of > >> > modules, please share ;-) > >> > > >> > On Mon, Dec 26, 2016 at 6:30 AM, Etienne Chauchot > ><[email protected]> > >> > wrote: > >> > > >> > > Hi Stephen, > >> > > > >> > > Thanks for taking the time to comment. > >> > > > >> > > My comments are bellow in the email: > >> > > > >> > > > >> > > Le 24/12/2016 à 00:07, Stephen Sisk a écrit : > >> > > > >> > >> hey Etienne - > >> > >> > >> > >> thanks for your thoughts and thanks for sharing your > >experiences. I > >> > >> generally agree with what you're saying. Quick comments below: > >> > >> > >> > >> IT are stored alongside with UT in src/test directory of the IO > >but > >> they > >> > >>> > >> > >> might go to dedicated module, waiting for a consensus > >> > >> I don't have a strong opinion or feel that I've worked enough > >with > >> maven > >> > >> to > >> > >> understand all the consequences - I'd love for someone with more > >maven > >> > >> experience to weigh in. If this becomes blocking, I'd say check > >it in, > >> > and > >> > >> we can refactor later if it proves problematic. > >> > >> > >> > > Sure, not a blocking point, it could be refactored afterwards. > >Just as > >> a > >> > > reminder, JB mentioned that storing IT in separate module allows > >to > >> have > >> > > more coherence between all IT (same behavior) and to do cross IO > >> > > integration tests. JB, have you experienced some long term > >drawbacks of > >> > > storing IT in a separate module, like, for example, more > >difficult > >> > > maintenance due to "distance" with production code? > >> > > > >> > > > >> > >> Also IMHO, it is better that tests load/clean data than doing > >some > >> > >>> > >> > >> assumptions about the running order of the tests. > >> > >> I definitely agree that we don't want to make assumptions about > >the > >> > >> running > >> > >> order of the tests - that way lies pain. :) It will be > >interesting to > >> > see > >> > >> how the performance tests work out since they will need more > >data (and > >> > >> thus > >> > >> loading data can take much longer.) > >> > >> > >> > > Yes, performance testing might push in the direction of data > >loading > >> from > >> > > outside the tests due to loading time. > >> > > > >> > >> This should also be an easier problem > >> > >> for read tests than for write tests - if we have long running > >> instances, > >> > >> read tests don't really need cleanup. And if write tests only > >write a > >> > >> small > >> > >> amount of data, as long as we are sure we're writing to uniquely > >> > >> identifiable locations (ie, new table per test or something > >similar), > >> we > >> > >> can clean up the write test data on a slower schedule. > >> > >> > >> > > I agree > >> > > > >> > >> > >> > >> this will tend to go to the direction of long running data store > >> > >>> > >> > >> instances rather than data store instances started (and > >optionally > >> > loaded) > >> > >> before tests. > >> > >> It may be easiest to start with a "data stores stay running" > >> > >> implementation, and then if we see issues with that move towards > >tests > >> > >> that > >> > >> start/stop the data stores on each run. One thing I'd like to > >make > >> sure > >> > is > >> > >> that we're not manually tweaking the configurations for data > >stores. > >> One > >> > >> way we could do that is to destroy/recreate the data stores on a > >> slower > >> > >> schedule - maybe once per week. That way if the script is > >changed or > >> the > >> > >> data store instances are changed, we'd be able to detect it > >relatively > >> > >> soon > >> > >> while still removing the need for the tests to manage the data > >stores. > >> > >> > >> > > I agree. In addition to configuration manual tweaking, there > >might be > >> > > cases in which a data store re-partition data during a test or > >after > >> some > >> > > tests while the dataset changes. The IO must be tolerant to that > >but > >> the > >> > > asserts (number of bundles for example) in test must not fail in > >that > >> > case. > >> > > I would also prefer if possible that the tests do not manage data > >> stores > >> > > (not setup them, not start them, not stop them) > >> > > > >> > > > >> > >> as a general note, I suspect many of the folks in the states > >will be > >> on > >> > >> holiday until Jan 2nd/3rd. > >> > >> > >> > >> S > >> > >> > >> > >> On Fri, Dec 23, 2016 at 7:48 AM Etienne Chauchot > ><[email protected] > >> > > >> > >> wrote: > >> > >> > >> > >> Hi, > >> > >>> > >> > >>> Recently we had a discussion about integration tests of IOs. > >I'm > >> > >>> preparing a PR for integration tests of the elasticSearch IO > >> > >>> ( > >> > >>> https://github.com/echauchot/incubator-beam/tree/BEAM-1184-E > >> > >>> LASTICSEARCH-IO > >> > >>> as a first shot) which are very important IMHO because they > >helped > >> > catch > >> > >>> some bugs that UT could not (volume, data store instance > >sharing, > >> real > >> > >>> data store instance ...) > >> > >>> > >> > >>> I would like to have your thoughts/remarks about points bellow. > >Some > >> of > >> > >>> these points are also discussed here > >> > >>> > >> > >>> https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-Np > >> > >>> rQ7vbf1jNVRgdqeEE8I/edit#heading=h.7ly6e7beup8a > >> > >>> : > >> > >>> > >> > >>> - UT and IT have a similar architecture, but while UT focus on > >> testing > >> > >>> the correct behavior of the code including corner cases and use > >> > embedded > >> > >>> in memory data store, IT assume that the behavior is correct > >(strong > >> > UT) > >> > >>> and focus on higher volume testing and testing against real > >data > >> store > >> > >>> instance(s) > >> > >>> > >> > >>> - For now, IT are stored alongside with UT in src/test > >directory of > >> the > >> > >>> IO but they might go to dedicated module, waiting for a > >consensus. > >> > Maven > >> > >>> is not configured to run them automatically because data store > >is not > >> > >>> available on jenkins server yet > >> > >>> > >> > >>> - For now, they only use DirectRunner, but they will be run > >against > >> > >>> each runner. > >> > >>> > >> > >>> - IT do not setup data store instance (like stated in the above > >> > >>> document) they assume that one is already running (hardcoded > >> > >>> configuration in test for now, waiting for a common solution to > >pass > >> > >>> configuration to IT). A docker container script is provided in > >the > >> > >>> contrib directory as a starting point to whatever orchestration > >> > software > >> > >>> will be chosen. > >> > >>> > >> > >>> - IT load and clean test data before and after each test if > >needed. > >> It > >> > >>> is simpler to do so because some tests need empty data store > >(write > >> > >>> test) and because, as discussed in the document, tests might > >not be > >> the > >> > >>> only users of the data store. Also IMHO, it is better that > >tests > >> > >>> load/clean data than doing some assumptions about the running > >order > >> of > >> > >>> the tests. > >> > >>> > >> > >>> If we generalize this pattern to all IT tests, this will tend > >to go > >> to > >> > >>> the direction of long running data store instances rather than > >data > >> > >>> store instances started (and optionally loaded) before tests. > >> > >>> > >> > >>> Besides if we where to change our minds and load data from > >outside > >> the > >> > >>> tests, a logstash script is provided. > >> > >>> > >> > >>> If you have any thoughts or remarks I'm all ears :) > >> > >>> > >> > >>> Regards, > >> > >>> > >> > >>> Etienne > >> > >>> > >> > >>> Le 14/12/2016 à 17:07, Jean-Baptiste Onofré a écrit : > >> > >>> > >> > >>>> Hi Stephen, > >> > >>>> > >> > >>>> the purpose of having in a specific module is to share > >resources and > >> > >>>> apply the same behavior from IT perspective and be able to > >have IT > >> > >>>> "cross" IO (for instance, reading from JMS and sending to > >Kafka, I > >> > >>>> think that's the key idea for integration tests). > >> > >>>> > >> > >>>> For instance, in Karaf, we have: > >> > >>>> - utest in each module > >> > >>>> - itest module containing itests for all modules all together > >> > >>>> > >> > >>>> Regards > >> > >>>> JB > >> > >>>> > >> > >>>> On 12/14/2016 04:59 PM, Stephen Sisk wrote: > >> > >>>> > >> > >>>>> Hi Etienne, > >> > >>>>> > >> > >>>>> thanks for following up and answering my questions. > >> > >>>>> > >> > >>>>> re: where to store integration tests - having them all in a > >> separate > >> > >>>>> module > >> > >>>>> is an interesting idea. I couldn't find JB's comments about > >moving > >> > them > >> > >>>>> into a separate module in the PR - can you share the reasons > >for > >> > >>>>> doing so? > >> > >>>>> The IO integration/perf tests so it does seem like they'll > >need to > >> be > >> > >>>>> treated in a special manner, but given that there is already > >an IO > >> > >>>>> specific > >> > >>>>> module, it may just be that we need to treat all the ITs in > >the IO > >> > >>>>> module > >> > >>>>> the same way. I don't have strong opinions either way right > >now. > >> > >>>>> > >> > >>>>> S > >> > >>>>> > >> > >>>>> On Wed, Dec 14, 2016 at 2:39 AM Etienne Chauchot < > >> > [email protected]> > >> > >>>>> wrote: > >> > >>>>> > >> > >>>>> Hi guys, > >> > >>>>> > >> > >>>>> @Stephen: I addressed all your comments directly in the PR, > >thanks! > >> > >>>>> I just wanted to comment here about the docker image I used: > >the > >> only > >> > >>>>> official Elastic image contains only ElasticSearch. But for > >> testing I > >> > >>>>> needed logstash (for ingestion) and kibana (not for > >integration > >> > tests, > >> > >>>>> but to easily test REST requests to ES using sense). This is > >why I > >> > use > >> > >>>>> an ELK (Elasticsearch+Logstash+Kibana) image. This one > >isreleased > >> > >>>>> under > >> > >>>>> theapache 2 license. > >> > >>>>> > >> > >>>>> > >> > >>>>> Besides, there is also a point about where to store > >integration > >> > tests: > >> > >>>>> JB proposed in the PR to store integration tests to dedicated > >> module > >> > >>>>> rather than directly in the IO module (like I did). > >> > >>>>> > >> > >>>>> > >> > >>>>> > >> > >>>>> Etienne > >> > >>>>> > >> > >>>>> Le 01/12/2016 à 20:14, Stephen Sisk a écrit : > >> > >>>>> > >> > >>>>>> hey! > >> > >>>>>> > >> > >>>>>> thanks for sending this. I'm very excited to see this > >change. I > >> > >>>>>> added some > >> > >>>>>> detail-oriented code review comments in addition to what > >I've > >> > >>>>>> discussed > >> > >>>>>> here. > >> > >>>>>> > >> > >>>>>> The general goal is to allow for re-usable instantiation of > >> > particular > >> > >>>>>> > >> > >>>>> data > >> > >>>>> > >> > >>>>>> store instances and this seems like a good start. Looks like > >you > >> > >>>>>> also have > >> > >>>>>> a script to generate test data for your tests - that's > >great. > >> > >>>>>> > >> > >>>>>> The next steps (definitely not blocking your work) will be > >to have > >> > >>>>>> ways to > >> > >>>>>> create instances from the docker images you have here, and > >use > >> them > >> > >>>>>> in the > >> > >>>>>> tests. We'll need support in the test framework for that > >since > >> it'll > >> > >>>>>> be > >> > >>>>>> different on developer machines and in the beam jenkins > >cluster, > >> but > >> > >>>>>> your > >> > >>>>>> scripts here allow someone running these tests locally to > >not have > >> > to > >> > >>>>>> > >> > >>>>> worry > >> > >>>>> > >> > >>>>>> about getting the instance set up and can manually adjust, > >so this > >> > is > >> > >>>>>> a > >> > >>>>>> good incremental step. > >> > >>>>>> > >> > >>>>>> I have some thoughts now that I'm reviewing your scripts > >(that I > >> > >>>>>> didn't > >> > >>>>>> have previously, so we are learning this together): > >> > >>>>>> * It may be useful to try and document why we chose a > >particular > >> > >>>>>> docker > >> > >>>>>> image as the base (ie, "this is the official supported > >elastic > >> > search > >> > >>>>>> docker image" or "this image has several data stores > >together that > >> > >>>>>> can be > >> > >>>>>> used for a couple different tests") - I'm curious as to > >whether > >> the > >> > >>>>>> community thinks that is important > >> > >>>>>> > >> > >>>>>> One thing that I called out in the comment that's worth > >mentioning > >> > >>>>>> on the > >> > >>>>>> larger list - if you want to specify which specific runners > >a test > >> > >>>>>> uses, > >> > >>>>>> that can be controlled in the pom for the module. I updated > >the > >> > >>>>>> testing > >> > >>>>>> > >> > >>>>> doc > >> > >>>>> > >> > >>>>>> mentioned previously in this thread with a TODO to talk > >about this > >> > >>>>>> more. I > >> > >>>>>> think we should also make it so that IO modules have that > >> > >>>>>> automatically, > >> > >>>>>> > >> > >>>>> so > >> > >>>>> > >> > >>>>>> developers don't have to worry about it. > >> > >>>>>> > >> > >>>>>> S > >> > >>>>>> > >> > >>>>>> On Thu, Dec 1, 2016 at 9:00 AM Etienne Chauchot < > >> > [email protected]> > >> > >>>>>> > >> > >>>>> wrote: > >> > >>>>> > >> > >>>>>> Stephen, > >> > >>>>>> > >> > >>>>>> As discussed, I added injection script, docker containers > >scripts > >> > and > >> > >>>>>> integration tests to the sdks/java/io/elasticsearch/contrib > >> > >>>>>> < > >> > >>>>>> > >> > >>>>>> https://github.com/apache/incubator-beam/pull/1439/files/1e7 > >> > >>> e2f0a6e1a1777d31ae2c886c920efccd708b5#diff-e243536428d06ade7 > >> > >>> d824cefcb3ed0b9 > >> > >>> > >> > >>>> directory in that PR: > >> > >>>>>> https://github.com/apache/incubator-beam/pull/1439. > >> > >>>>>> > >> > >>>>>> These work well but they are first shot. Do you have any > >comments > >> > >>>>>> about > >> > >>>>>> those? > >> > >>>>>> > >> > >>>>>> Besides I am not very sure that these files should be in the > >IO > >> > itself > >> > >>>>>> (even in contrib directory, out of maven source > >directories). Any > >> > >>>>>> > >> > >>>>> thoughts? > >> > >>>>> > >> > >>>>>> Thanks, > >> > >>>>>> > >> > >>>>>> Etienne > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> Le 23/11/2016 à 19:03, Stephen Sisk a écrit : > >> > >>>>>> > >> > >>>>>>> It's great to hear more experiences. > >> > >>>>>>> > >> > >>>>>>> I'm also glad to hear that people see real value in the > >high > >> > >>>>>>> volume/performance benchmark tests. I tried to capture that > >in > >> the > >> > >>>>>>> > >> > >>>>>> Testing > >> > >>>>> > >> > >>>>>> doc I shared, under "Reasons for Beam Test Strategy". [1] > >> > >>>>>>> > >> > >>>>>>> It does generally sound like we're in agreement here. Areas > >of > >> > >>>>>>> discussion > >> > >>>>>>> > >> > >>>>>> I > >> > >>>>>> > >> > >>>>>>> see: > >> > >>>>>>> 1. People like the idea of bringing up fresh instances for > >each > >> > test > >> > >>>>>>> rather than keeping instances running all the time, since > >that > >> > >>>>>>> ensures no > >> > >>>>>>> contamination between tests. That seems reasonable to me. > >If we > >> see > >> > >>>>>>> flakiness in the tests or we note that setting up/tearing > >down > >> > >>>>>>> instances > >> > >>>>>>> > >> > >>>>>> is > >> > >>>>>> > >> > >>>>>>> taking a lot of time, > >> > >>>>>>> 2. Deciding on cluster management software/orchestration > >software > >> > - I > >> > >>>>>>> > >> > >>>>>> want > >> > >>>>> > >> > >>>>>> to make sure we land on the right tool here since choosing > >the > >> > >>>>>>> wrong tool > >> > >>>>>>> could result in administration of the instances taking more > >> work. I > >> > >>>>>>> > >> > >>>>>> suspect > >> > >>>>>> > >> > >>>>>>> that's a good place for a follow up discussion, so I'll > >start a > >> > >>>>>>> separate > >> > >>>>>>> thread on that. I'm happy with whatever tool we choose, but > >I > >> want > >> > to > >> > >>>>>>> > >> > >>>>>> make > >> > >>>>> > >> > >>>>>> sure we take a moment to consider different options and have > >a > >> > >>>>>>> reason for > >> > >>>>>>> choosing one. > >> > >>>>>>> > >> > >>>>>>> Etienne - thanks for being willing to port your > >creation/other > >> > >>>>>>> scripts > >> > >>>>>>> over. You might be a good early tester of whether this > >system > >> works > >> > >>>>>>> well > >> > >>>>>>> for everyone. > >> > >>>>>>> > >> > >>>>>>> Stephen > >> > >>>>>>> > >> > >>>>>>> [1] Reasons for Beam Test Strategy - > >> > >>>>>>> > >> > >>>>>>> > >https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-Np > >> > >>> rQ7vbf1jNVRgdqeEE8I/edit?ts=58349aec# > >> > >>> > >> > >>>> > >> > >>>>>>> On Wed, Nov 23, 2016 at 12:48 AM Jean-Baptiste Onofré > >> > >>>>>>> <[email protected]> > >> > >>>>>>> wrote: > >> > >>>>>>> > >> > >>>>>>> I second Etienne there. > >> > >>>>>>>> > >> > >>>>>>>> We worked together on the ElasticsearchIO and definitely, > >the > >> high > >> > >>>>>>>> valuable test we did were integration tests with ES on > >docker > >> and > >> > >>>>>>>> high > >> > >>>>>>>> volume. > >> > >>>>>>>> > >> > >>>>>>>> I think we have to distinguish the two kinds of tests: > >> > >>>>>>>> 1. utests are located in the IO itself and basically they > >should > >> > >>>>>>>> cover > >> > >>>>>>>> the core behaviors of the IO > >> > >>>>>>>> 2. itests are located as contrib in the IO (they could be > >part > >> of > >> > >>>>>>>> the IO > >> > >>>>>>>> but executed by the integration-test plugin or a specific > >> profile) > >> > >>>>>>>> that > >> > >>>>>>>> deals with "real" backend and high volumes. The resources > >> required > >> > >>>>>>>> by > >> > >>>>>>>> the itest can be bootstrapped by Jenkins (for instance > >using > >> > >>>>>>>> Mesos/Marathon and docker images as already discussed, and > >it's > >> > >>>>>>>> what I'm > >> > >>>>>>>> doing on my own "server"). > >> > >>>>>>>> > >> > >>>>>>>> It's basically what Stephen described. > >> > >>>>>>>> > >> > >>>>>>>> We have to not relay only on itest: utests are very > >important > >> and > >> > >>>>>>>> they > >> > >>>>>>>> validate the core behavior. > >> > >>>>>>>> > >> > >>>>>>>> My $0.01 ;) > >> > >>>>>>>> > >> > >>>>>>>> Regards > >> > >>>>>>>> JB > >> > >>>>>>>> > >> > >>>>>>>> On 11/23/2016 09:27 AM, Etienne Chauchot wrote: > >> > >>>>>>>> > >> > >>>>>>>>> Hi Stephen, > >> > >>>>>>>>> > >> > >>>>>>>>> I like your proposition very much and I also agree that > >docker > >> + > >> > >>>>>>>>> some > >> > >>>>>>>>> orchestration software would be great ! > >> > >>>>>>>>> > >> > >>>>>>>>> On the elasticsearchIO (PR to be created this week) there > >is > >> > docker > >> > >>>>>>>>> container creation scripts and logstash data ingestion > >script > >> for > >> > >>>>>>>>> IT > >> > >>>>>>>>> environment available in contrib directory alongside with > >> > >>>>>>>>> integration > >> > >>>>>>>>> tests themselves. I'll be happy to make them compliant to > >new > >> IT > >> > >>>>>>>>> environment. > >> > >>>>>>>>> > >> > >>>>>>>>> What you say bellow about the need for external IT > >environment > >> is > >> > >>>>>>>>> particularly true. As an example with ES what came out in > >first > >> > >>>>>>>>> implementation was that there were problems starting at > >some > >> high > >> > >>>>>>>>> > >> > >>>>>>>> volume > >> > >>>>> > >> > >>>>>> of data (timeouts, ES windowing overflow...) that could not > >have > >> be > >> > >>>>>>>>> > >> > >>>>>>>> seen > >> > >>>>> > >> > >>>>>> on embedded ES version. Also there where some > >particularities to > >> > >>>>>>>>> external instance like secondary (replica) shards that > >where > >> not > >> > >>>>>>>>> > >> > >>>>>>>> visible > >> > >>>>> > >> > >>>>>> on embedded instance. > >> > >>>>>>>>> > >> > >>>>>>>>> Besides, I also favor bringing up instances before test > >because > >> > it > >> > >>>>>>>>> allows (amongst other things) to be sure to start on a > >fresh > >> > >>>>>>>>> dataset > >> > >>>>>>>>> > >> > >>>>>>>> for > >> > >>>>> > >> > >>>>>> the test to be deterministic. > >> > >>>>>>>>> > >> > >>>>>>>>> Etienne > >> > >>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>>>> Le 23/11/2016 à 02:00, Stephen Sisk a écrit : > >> > >>>>>>>>> > >> > >>>>>>>>>> Hi, > >> > >>>>>>>>>> > >> > >>>>>>>>>> I'm excited we're getting lots of discussion going. > >There are > >> > many > >> > >>>>>>>>>> threads > >> > >>>>>>>>>> of conversation here, we may choose to split some of > >them off > >> > >>>>>>>>>> into a > >> > >>>>>>>>>> different email thread. I'm also betting I missed some > >of the > >> > >>>>>>>>>> questions in > >> > >>>>>>>>>> this thread, so apologies ahead of time for that. Also > >> apologies > >> > >>>>>>>>>> for > >> > >>>>>>>>>> > >> > >>>>>>>>> the > >> > >>>>>> > >> > >>>>>>> amount of text, I provided some quick summaries at the top > >of > >> each > >> > >>>>>>>>>> section. > >> > >>>>>>>>>> > >> > >>>>>>>>>> Amit - thanks for your thoughts. I've responded in > >detail > >> below. > >> > >>>>>>>>>> Ismael - thanks for offering to help. There's plenty of > >work > >> > >>>>>>>>>> here to > >> > >>>>>>>>>> > >> > >>>>>>>>> go > >> > >>>>> > >> > >>>>>> around. I'll try and think about how we can divide up some > >next > >> > >>>>>>>>>> steps > >> > >>>>>>>>>> (probably in a separate thread.) The main next step I > >see is > >> > >>>>>>>>>> deciding > >> > >>>>>>>>>> between kubernetes/mesos+marathon/docker swarm - I'm > >working > >> on > >> > >>>>>>>>>> that, > >> > >>>>>>>>>> > >> > >>>>>>>>> but > >> > >>>>>>>> > >> > >>>>>>>>> having lots of different thoughts on what the > >> > >>>>>>>>>> advantages/disadvantages > >> > >>>>>>>>>> > >> > >>>>>>>>> of > >> > >>>>>>>> > >> > >>>>>>>>> those are would be helpful (I'm not entirely sure of the > >> > >>>>>>>>>> protocol for > >> > >>>>>>>>>> collaborating on sub-projects like this.) > >> > >>>>>>>>>> > >> > >>>>>>>>>> These issues are all related to what kind of tests we > >want to > >> > >>>>>>>>>> write. I > >> > >>>>>>>>>> think a kubernetes/mesos/swarm cluster could support all > >the > >> use > >> > >>>>>>>>>> cases > >> > >>>>>>>>>> we've discussed here (and thus should not block moving > >forward > >> > >>>>>>>>>> with > >> > >>>>>>>>>> this), > >> > >>>>>>>>>> but understanding what we want to test will help us > >understand > >> > >>>>>>>>>> how the > >> > >>>>>>>>>> cluster will be used. I'm working on a proposed user > >guide for > >> > >>>>>>>>>> testing > >> > >>>>>>>>>> > >> > >>>>>>>>> IO > >> > >>>>>>>> > >> > >>>>>>>>> Transforms, and I'm going to send out a link to that + a > >short > >> > >>>>>>>>>> summary > >> > >>>>>>>>>> > >> > >>>>>>>>> to > >> > >>>>>>>> > >> > >>>>>>>>> the list shortly so folks can get a better sense of where > >I'm > >> > >>>>>>>>>> coming > >> > >>>>>>>>>> from. > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> Here's my thinking on the questions we've raised here - > >> > >>>>>>>>>> > >> > >>>>>>>>>> Embedded versions of data stores for testing > >> > >>>>>>>>>> -------------------- > >> > >>>>>>>>>> Summary: yes! But we still need real data stores to test > >> > against. > >> > >>>>>>>>>> > >> > >>>>>>>>>> I am a gigantic fan of using embedded versions of the > >various > >> > data > >> > >>>>>>>>>> stores. > >> > >>>>>>>>>> I think we should test everything we possibly can using > >them, > >> > >>>>>>>>>> and do > >> > >>>>>>>>>> > >> > >>>>>>>>> the > >> > >>>>>> > >> > >>>>>>> majority of our correctness testing using embedded versions > >+ the > >> > >>>>>>>>>> > >> > >>>>>>>>> direct > >> > >>>>>> > >> > >>>>>>> runner. However, it's also important to have at least one > >test > >> that > >> > >>>>>>>>>> actually connects to an actual instance, so we can get > >> coverage > >> > >>>>>>>>>> for > >> > >>>>>>>>>> things > >> > >>>>>>>>>> like credentials, real connection strings, etc... > >> > >>>>>>>>>> > >> > >>>>>>>>>> The key point is that embedded versions definitely can't > >cover > >> > the > >> > >>>>>>>>>> performance tests, so we need to host instances if we > >want to > >> > test > >> > >>>>>>>>>> > >> > >>>>>>>>> that. > >> > >>>>>> > >> > >>>>>>> I consider the integration tests/performance benchmarks to > >be > >> > >>>>>>>>>> costly > >> > >>>>>>>>>> things > >> > >>>>>>>>>> that we do only for the IO transforms with large amounts > >of > >> > >>>>>>>>>> community > >> > >>>>>>>>>> support/usage. A random IO transform used by a few users > >> doesn't > >> > >>>>>>>>>> necessarily need integration & perf tests, but for > >heavily > >> used > >> > IO > >> > >>>>>>>>>> transforms, there's a lot of community value in these > >tests. > >> The > >> > >>>>>>>>>> maintenance proposal below scales with the amount of > >community > >> > >>>>>>>>>> support > >> > >>>>>>>>>> for > >> > >>>>>>>>>> a particular IO transform. > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> Reusing data stores ("use the data stores across > >executions.") > >> > >>>>>>>>>> ------------------ > >> > >>>>>>>>>> Summary: I favor a hybrid approach: some frequently > >used, very > >> > >>>>>>>>>> small > >> > >>>>>>>>>> instances that we keep up all the time + larger > >> multi-container > >> > >>>>>>>>>> data > >> > >>>>>>>>>> store > >> > >>>>>>>>>> instances that we spin up for perf tests. > >> > >>>>>>>>>> > >> > >>>>>>>>>> I don't think we need to have a strong answer to this > >> question, > >> > >>>>>>>>>> but I > >> > >>>>>>>>>> think > >> > >>>>>>>>>> we do need to know what range of capabilities we need, > >and use > >> > >>>>>>>>>> that to > >> > >>>>>>>>>> inform our requirements on the hosting infrastructure. I > >think > >> > >>>>>>>>>> kubernetes/mesos + docker can support all the scenarios > >I > >> > discuss > >> > >>>>>>>>>> > >> > >>>>>>>>> below. > >> > >>>>>> > >> > >>>>>>> I had been thinking of a hybrid approach - reuse some > >instances > >> and > >> > >>>>>>>>>> > >> > >>>>>>>>> don't > >> > >>>>>>>> > >> > >>>>>>>>> reuse others. Some tests require isolation from other > >tests > >> (eg. > >> > >>>>>>>>>> performance benchmarking), while others can easily > >re-use the > >> > same > >> > >>>>>>>>>> database/data store instance over time, provided they > >are > >> > >>>>>>>>>> written in > >> > >>>>>>>>>> > >> > >>>>>>>>> the > >> > >>>>>> > >> > >>>>>>> correct manner (eg. a simple read or write correctness > >> integration > >> > >>>>>>>>>> > >> > >>>>>>>>> tests) > >> > >>>>>>>> > >> > >>>>>>>>> To me, the question of whether to use one instance over > >time > >> for > >> > a > >> > >>>>>>>>>> test vs > >> > >>>>>>>>>> spin up an instance for each test comes down to a trade > >off > >> > >>>>>>>>>> between > >> > >>>>>>>>>> > >> > >>>>>>>>> these > >> > >>>>>>>> > >> > >>>>>>>>> factors: > >> > >>>>>>>>>> 1. Flakiness of spin-up of an instance - if it's super > >flaky, > >> > >>>>>>>>>> we'll > >> > >>>>>>>>>> want to > >> > >>>>>>>>>> keep more instances up and running rather than bring > >them > >> > up/down. > >> > >>>>>>>>>> > >> > >>>>>>>>> (this > >> > >>>>>> > >> > >>>>>>> may also vary by the data store in question) > >> > >>>>>>>>>> 2. Frequency of testing - if we are running tests every > >5 > >> > >>>>>>>>>> minutes, it > >> > >>>>>>>>>> > >> > >>>>>>>>> may > >> > >>>>>>>> > >> > >>>>>>>>> be wasteful to bring machines up/down every time. If we > >run > >> > >>>>>>>>>> tests once > >> > >>>>>>>>>> > >> > >>>>>>>>> a > >> > >>>>>> > >> > >>>>>>> day or week, it seems wasteful to keep the machines up the > >whole > >> > >>>>>>>>>> time. > >> > >>>>>>>>>> 3. Isolation requirements - If tests must be isolated, > >it > >> means > >> > we > >> > >>>>>>>>>> > >> > >>>>>>>>> either > >> > >>>>>>>> > >> > >>>>>>>>> have to bring up the instances for each test, or we have > >to > >> have > >> > >>>>>>>>>> some > >> > >>>>>>>>>> sort > >> > >>>>>>>>>> of signaling mechanism to indicate that a given instance > >is in > >> > >>>>>>>>>> use. I > >> > >>>>>>>>>> strongly favor bringing up an instance per test. > >> > >>>>>>>>>> 4. Number/size of containers - if we need a large number > >of > >> > >>>>>>>>>> machines > >> > >>>>>>>>>> for a > >> > >>>>>>>>>> particular test, keeping them running all the time will > >use > >> more > >> > >>>>>>>>>> resources. > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> The major unknown to me is how flaky it'll be to spin > >these > >> up. > >> > >>>>>>>>>> I'm > >> > >>>>>>>>>> hopeful/assuming they'll be pretty stable to bring up, > >but I > >> > >>>>>>>>>> think the > >> > >>>>>>>>>> best > >> > >>>>>>>>>> way to test that is to start doing it. > >> > >>>>>>>>>> > >> > >>>>>>>>>> I suspect the sweet spot is the following: have a set of > >very > >> > >>>>>>>>>> small > >> > >>>>>>>>>> > >> > >>>>>>>>> data > >> > >>>>>> > >> > >>>>>>> store instances that stay up to support small-data-size > >> post-commit > >> > >>>>>>>>>> end to > >> > >>>>>>>>>> end tests (post-commits run frequently and the data size > >means > >> > the > >> > >>>>>>>>>> instances would not use many resources), combined with > >the > >> > >>>>>>>>>> ability to > >> > >>>>>>>>>> spin > >> > >>>>>>>>>> up larger instances for once a day/week performance > >benchmarks > >> > >>>>>>>>>> (these > >> > >>>>>>>>>> > >> > >>>>>>>>> use > >> > >>>>>>>> > >> > >>>>>>>>> up more resources and are used less frequently.) That's > >the mix > >> > >>>>>>>>>> I'll > >> > >>>>>>>>>> propose in my docs on testing IO transforms. If > >spinning up > >> new > >> > >>>>>>>>>> instances > >> > >>>>>>>>>> is cheap/non-flaky, I'd be fine with the idea of > >spinning up > >> > >>>>>>>>>> instances > >> > >>>>>>>>>> for > >> > >>>>>>>>>> each test. > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> Management ("what's the overhead of managing such a > >> deployment") > >> > >>>>>>>>>> -------------------- > >> > >>>>>>>>>> Summary: I propose that anyone can contribute scripts > >for > >> > >>>>>>>>>> setting up > >> > >>>>>>>>>> > >> > >>>>>>>>> data > >> > >>>>>>>> > >> > >>>>>>>>> store instances + integration/perf tests, but if the > >community > >> > >>>>>>>>>> doesn't > >> > >>>>>>>>>> maintain a particular data store's tests, we disable the > >tests > >> > and > >> > >>>>>>>>>> turn off > >> > >>>>>>>>>> the data store instances. > >> > >>>>>>>>>> > >> > >>>>>>>>>> Management of these instances is a crucial question. > >First, > >> > let's > >> > >>>>>>>>>> > >> > >>>>>>>>> break > >> > >>>>> > >> > >>>>>> down what tasks we'll need to do on a recurring basis: > >> > >>>>>>>>>> 1. Ongoing maintenance (update to new versions, both > >instance > >> & > >> > >>>>>>>>>> dependencies) - we don't want to have a lot of old > >versions > >> > >>>>>>>>>> vulnerable > >> > >>>>>>>>>> > >> > >>>>>>>>> to > >> > >>>>>>>> > >> > >>>>>>>>> attacks/buggy > >> > >>>>>>>>>> 2. Investigate breakages/regressions > >> > >>>>>>>>>> (I'm betting there will be more things we'll discover - > >let me > >> > >>>>>>>>>> know if > >> > >>>>>>>>>> you > >> > >>>>>>>>>> have suggestions) > >> > >>>>>>>>>> > >> > >>>>>>>>>> There's a couple goals I see: > >> > >>>>>>>>>> 1. We should only do sys admin work for things that give > >us a > >> > >>>>>>>>>> lot of > >> > >>>>>>>>>> benefit. (ie, don't build IT/perf/data store set up > >scripts > >> for > >> > >>>>>>>>>> data > >> > >>>>>>>>>> stores > >> > >>>>>>>>>> without a large community) > >> > >>>>>>>>>> 2. We should do as much as possible of testing via > >> > >>>>>>>>>> in-memory/embedded > >> > >>>>>>>>>> testing (as you brought up). > >> > >>>>>>>>>> 3. Reduce the amount of manual administration overhead > >> > >>>>>>>>>> > >> > >>>>>>>>>> As I discussed above, I think that integration > >> tests/performance > >> > >>>>>>>>>> benchmarks > >> > >>>>>>>>>> are costly things that we should do only for the IO > >transforms > >> > >>>>>>>>>> with > >> > >>>>>>>>>> > >> > >>>>>>>>> large > >> > >>>>>>>> > >> > >>>>>>>>> amounts of community support/usage. Thus, I propose that > >we > >> > >>>>>>>>>> limit the > >> > >>>>>>>>>> > >> > >>>>>>>>> IO > >> > >>>>>> > >> > >>>>>>> transforms that get integration tests & performance > >benchmarks to > >> > >>>>>>>>>> > >> > >>>>>>>>> those > >> > >>>>> > >> > >>>>>> that have community support for maintaining the data store > >> > >>>>>>>>>> instances. > >> > >>>>>>>>>> > >> > >>>>>>>>>> We can enforce this organically using some simple rules: > >> > >>>>>>>>>> 1. Investigating breakages/regressions: if a given > >> > >>>>>>>>>> integration/perf > >> > >>>>>>>>>> > >> > >>>>>>>>> test > >> > >>>>>> > >> > >>>>>>> starts failing and no one investigates it within a set > >period of > >> > >>>>>>>>>> time > >> > >>>>>>>>>> > >> > >>>>>>>>> (a > >> > >>>>>> > >> > >>>>>>> week?), we disable the tests and shut off the data store > >> > >>>>>>>>>> instances if > >> > >>>>>>>>>> > >> > >>>>>>>>> we > >> > >>>>>> > >> > >>>>>>> have instances running. When someone wants to step up and > >> > >>>>>>>>>> support it > >> > >>>>>>>>>> again, > >> > >>>>>>>>>> they can fix the test, check it in, and re-enable the > >test. > >> > >>>>>>>>>> 2. Ongoing maintenance: every N months, file a jira > >issue that > >> > >>>>>>>>>> is just > >> > >>>>>>>>>> "is > >> > >>>>>>>>>> the IO Transform X data store up to date?" - if the jira > >is > >> not > >> > >>>>>>>>>> resolved in > >> > >>>>>>>>>> a set period of time (1 month?), the perf/integration > >tests > >> are > >> > >>>>>>>>>> > >> > >>>>>>>>> disabled, > >> > >>>>>>>> > >> > >>>>>>>>> and the data store instances shut off. > >> > >>>>>>>>>> > >> > >>>>>>>>>> This is pretty flexible - > >> > >>>>>>>>>> * If a particular person or organization wants to > >support an > >> IO > >> > >>>>>>>>>> transform, > >> > >>>>>>>>>> they can. If a group of people all organically organize > >to > >> keep > >> > >>>>>>>>>> the > >> > >>>>>>>>>> > >> > >>>>>>>>> tests > >> > >>>>>>>> > >> > >>>>>>>>> running, they can. > >> > >>>>>>>>>> * It can be mostly automated - there's not a lot of > >central > >> > >>>>>>>>>> organizing > >> > >>>>>>>>>> work > >> > >>>>>>>>>> that needs to be done. > >> > >>>>>>>>>> > >> > >>>>>>>>>> Exposing the information about what IO transforms > >currently > >> have > >> > >>>>>>>>>> > >> > >>>>>>>>> running > >> > >>>>>> > >> > >>>>>>> IT/perf benchmarks on the website will let users know what > >IO > >> > >>>>>>>>>> > >> > >>>>>>>>> transforms > >> > >>>>>> > >> > >>>>>>> are well supported. > >> > >>>>>>>>>> > >> > >>>>>>>>>> I like this solution, but I also recognize this is a > >tricky > >> > >>>>>>>>>> problem. > >> > >>>>>>>>>> > >> > >>>>>>>>> This > >> > >>>>>>>> > >> > >>>>>>>>> is something the community needs to be supportive of, so > >I'm > >> > >>>>>>>>>> open to > >> > >>>>>>>>>> other > >> > >>>>>>>>>> thoughts. > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> Simulating failures in real nodes ("programmatic tests > >to > >> > simulate > >> > >>>>>>>>>> failure") > >> > >>>>>>>>>> ----------------- > >> > >>>>>>>>>> Summary: 1) Focus our testing on the code in Beam 2) We > >should > >> > >>>>>>>>>> encourage a > >> > >>>>>>>>>> design pattern separating out network/retry logic from > >the > >> main > >> > IO > >> > >>>>>>>>>> transform logic > >> > >>> > >
