Regards JB
On 01/18/2017 12:52 PM, Stephen Sisk wrote:
ah! I looked around a bit more and found the dcos package repo - https://github.com/mesosphere/universe/tree/version-3.x/repo/packages poking around a bit, I can find a lot of packages for single node instances, but not many packages for multi-node instances. Single node instance packages are kind of useful, but I don't think it's *too* helpful. The multi-node instance packages that run the data store's high availability mode are where the real work is, and it seems like both kubernetes helm and dcos' package universe don't have a lot of those. S On Wed, Jan 18, 2017 at 9:56 AM Stephen Sisk <[email protected]> wrote:Hi Ishmael, these are good questions, thanks for raising them. Ability to modify network/compute resources to simulate failures ================================================= I see two real questions here: 1. Is this something we want to do? 2. Is it possible with both/either? So far, the test strategy I've been advocating is that we test problems like this in unit tests rather than do this in ITs/Perf tests. Otherwise, it's hard to re-create the same conditions. I can investigate whether it's possible, but I want to clarify whether this is something that we care about. I know both support killing individual nodes. I haven't seen a lot of network control in either, but haven't tried to look for it. Availability of ready to play packages ============================ I did look at this, and as far as I could tell, mesos didn't have any pre-built packages for multi-node clusters of data stores. If there's a good repository of them that we trust, that would definitely save us time. Can you point me at the mesos repository? S On Wed, Jan 18, 2017 at 8:37 AM Jean-Baptiste Onofré <[email protected]> wrote: Hi Ismael Stephen will reply with details but I know he did a comparison and evaluate different options. He tested with the jdbc Io itests. Regards JB On Jan 18, 2017, 08:26, at 08:26, "Ismaël Mejía" <[email protected]> wrote:Thanks for your analysis Stephen, good arguments / references. One quick question. Have you checked the APIs of both (Mesos/Kubernetes) to see if we can do programmatically do more complex tests (I suppose so, but you don't mention how easy or if those are possible), for example to simulate a slow networking slave (to test stragglers), or to arbitrarily kill one slave (e.g. if I want to test the correct behavior of a runner/IO that is reading from it) ? Other missing point in the review is the availability of ready to play packages, I think in this area mesos/dcos seems more advanced no? I haven't looked recently but at least 6 months ago there were not many helm packages ready for example to test kafka or the hadoop echosystem stuff (hdfs, hbase, etc). Has this been improved ? because preparing this also is a considerable amount of work on the other hand this could be also a chance to contribute to kubernetes. Regards, Ismaël On Wed, Jan 18, 2017 at 2:36 AM, Stephen Sisk <[email protected]> wrote:hi! I've been continuing this investigation, and have some more info toreport,and hopefully we can start making some decisions. To support performance testing, I've been investigatingmesos+marathon andkubernetes for running data stores in their high availability mode. Ihavebeen examining features that kubernetes/mesos+marathon use to supportthis.Setting up a multi-node cluster in a high availability mode tends tobemore expensive time-wise than the single node instances I've playedaroundwith in the past. Rather than do a full build out with bothkubernetes andmesos, I'd like to pick one of the two options to build the prototype cluster with. If the prototype doesn't go well, we could still goback tothe other option, but I'd like to change us from a mode of "let'slook atall the options" to one of "here's the favorite, let's prove thatworks forus". Below are the features that I've seen are important to multi-nodeinstancesof data stores. I'm sure other folks on the list have done thisbefore, sofeel free to pipe up if I'm missing a good solution to a problem. DNS/Discovery -------------------- Necessary for talking between nodes (eg, cassandra nodes all need tobeable to talk to a set of seed nodes.) * Kubernetes has built-in DNS/discovery between nodes. * Mesos has supports this via mesos-dns, which isn't a part of coremesos,but is in dcos, which is the mesos distribution I've been using andthat Iwould expect us to use. Instances properly distributed across nodes ------------------------------------------------------------ If multiple instances of a data source end up on the same underlyingVM, wemay not get good performance out of those instances since theunderlying VMmay be more taxed than other VMs. * Kubernetes has a beta feature StatefulSets[1] which allow forcontainersdistributed so that there's one container per underlying machine (aswellas a lot of other useful features like easy stable dns names.) * Mesos can support this via the built in UNIQUE constraint [2] Load balancing -------------------- Incoming requests from users need to be distributed to the variousmachines- this is important for many data stores' high availability modes. * Kubernetes supports easily hooking up to an external load balancerwhenon a cloud (and can be configured to work with a built-in loadbalancer ifnot) * Mesos supports this via marathon-lb [3], which is an install-ablepackagein DC/OS Persistent Volumes tied to specific instances ------------------------------------------------------------ Databases often need persistent state (for example to store the data:), soit's an important part of running our service. * Kubernetes StatefulSets supports this * Mesos+marathon apps with persistent volumes supports this [4] [5] As I mentioned above, I'd like to focus on either kubernetes or mesosformy investigation, and as I go further along, I'm seeing kubernetes as better suited to our needs. (1) It supports more of the features we want out of the box and with StatefulSets, Kubernetes handles them all together neatly - eg. DC/OS requires marathon-lb to be installed and mesos-dns to be configured. (2) I'm also finding that there seem to be more examples of using kubernetes to solve the types of problems we're working on. This is somewhat subjective, but in my experience as I've tried to learn both kubernetes and mesos, I personally found it generally easier to get kubernetes running than mesos due to the tutorials/examples availableforkubernetes. (3) Lower cost of initial setup - as I discussed in a previousmail[6],kubernetes was far easier to get set up even when I knew the exactsteps.Mesos took me around 27 steps [7], which involved a lot of configthat waseasy to get wrong (it took me about 5 tries to get all the stepscorrect inone go.) Kubernetes took me around 8 steps and very little config. Given that, I'd like to focus my investigation/prototyping onKubernetes.To be clear, it's fairly close and I think both Mesos and Kubernetescouldsupport what we need, so if we run into issues with kubernetes, Mesosstillseems like a viable option that we could fall back to. Thanks, Stephen [1] Kubernetes StatefulSetshttps://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/[2] mesos unique constraint - https://mesosphere.github.io/marathon/docs/constraints.html [3] https://mesosphere.github.io/marathon/docs/service- discovery-load-balancing.html and https://mesosphere.com/blog/2015/12/04/dcos-marathon-lb/ [4]https://mesosphere.github.io/marathon/docs/persistent-volumes.html[5]https://dcos.io/docs/1.7/usage/tutorials/marathon/stateful-services/[6] Container Orchestration software for hosting data stores https://lists.apache.org/thread.html/5825b35b895839d0b33b6c726c1de0 e76bdb9653d1e913b1207c6c4d@%3Cdev.beam.apache.org%3E [7] https://github.com/ssisk/beam/blob/support/support/mesos/setup.md On Thu, Dec 29, 2016 at 5:44 PM Davor Bonaci <[email protected]>wrote:Just a quick drive-by comment: how tests are laid out hasnon-trivialtradeoffs on how/where continuous integration runs, and how resultsareintegrated into the tooling. The current state is certainly notideal(e.g., due to multiple test executions some links in Jenkins pointwherethey shouldn't), but most other alternatives had even biggerdrawbacks atthe time. If someone has great ideas that don't explode the numberofmodules, please share ;-) On Mon, Dec 26, 2016 at 6:30 AM, Etienne Chauchot<[email protected]>wrote:Hi Stephen, Thanks for taking the time to comment. My comments are bellow in the email: Le 24/12/2016 à 00:07, Stephen Sisk a écrit :hey Etienne - thanks for your thoughts and thanks for sharing yourexperiences. Igenerally agree with what you're saying. Quick comments below: IT are stored alongside with UT in src/test directory of the IObuttheymight go to dedicated module, waiting for a consensus I don't have a strong opinion or feel that I've worked enoughwithmavento understand all the consequences - I'd love for someone with moremavenexperience to weigh in. If this becomes blocking, I'd say checkit in,andwe can refactor later if it proves problematic.Sure, not a blocking point, it could be refactored afterwards.Just asareminder, JB mentioned that storing IT in separate module allowstohavemore coherence between all IT (same behavior) and to do cross IO integration tests. JB, have you experienced some long termdrawbacks ofstoring IT in a separate module, like, for example, moredifficultmaintenance due to "distance" with production code?Also IMHO, it is better that tests load/clean data than doingsomeassumptions about the running order of the tests. I definitely agree that we don't want to make assumptions abouttherunning order of the tests - that way lies pain. :) It will beinteresting toseehow the performance tests work out since they will need moredata (andthus loading data can take much longer.)Yes, performance testing might push in the direction of dataloadingfromoutside the tests due to loading time.This should also be an easier problem for read tests than for write tests - if we have long runninginstances,read tests don't really need cleanup. And if write tests onlywrite asmall amount of data, as long as we are sure we're writing to uniquely identifiable locations (ie, new table per test or somethingsimilar),wecan clean up the write test data on a slower schedule.I agreethis will tend to go to the direction of long running data storeinstances rather than data store instances started (andoptionallyloaded)before tests. It may be easiest to start with a "data stores stay running" implementation, and then if we see issues with that move towardsteststhat start/stop the data stores on each run. One thing I'd like tomakesureisthat we're not manually tweaking the configurations for datastores.Oneway we could do that is to destroy/recreate the data stores on aslowerschedule - maybe once per week. That way if the script ischanged orthedata store instances are changed, we'd be able to detect itrelativelysoon while still removing the need for the tests to manage the datastores.I agree. In addition to configuration manual tweaking, theremight becases in which a data store re-partition data during a test oraftersometests while the dataset changes. The IO must be tolerant to thatbuttheasserts (number of bundles for example) in test must not fail inthatcase.I would also prefer if possible that the tests do not manage datastores(not setup them, not start them, not stop them)as a general note, I suspect many of the folks in the stateswill beonholiday until Jan 2nd/3rd. S On Fri, Dec 23, 2016 at 7:48 AM Etienne Chauchot<[email protected]wrote: Hi,Recently we had a discussion about integration tests of IOs.I'mpreparing a PR for integration tests of the elasticSearch IO ( https://github.com/echauchot/incubator-beam/tree/BEAM-1184-E LASTICSEARCH-IO as a first shot) which are very important IMHO because theyhelpedcatchsome bugs that UT could not (volume, data store instancesharing,realdata store instance ...) I would like to have your thoughts/remarks about points bellow.Someofthese points are also discussed here https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-Np rQ7vbf1jNVRgdqeEE8I/edit#heading=h.7ly6e7beup8a : - UT and IT have a similar architecture, but while UT focus ontestingthe correct behavior of the code including corner cases and useembeddedin memory data store, IT assume that the behavior is correct(strongUT)and focus on higher volume testing and testing against realdatastoreinstance(s) - For now, IT are stored alongside with UT in src/testdirectory oftheIO but they might go to dedicated module, waiting for aconsensus.Mavenis not configured to run them automatically because data storeis notavailable on jenkins server yet - For now, they only use DirectRunner, but they will be runagainsteach runner. - IT do not setup data store instance (like stated in the above document) they assume that one is already running (hardcoded configuration in test for now, waiting for a common solution topassconfiguration to IT). A docker container script is provided inthecontrib directory as a starting point to whatever orchestrationsoftwarewill be chosen. - IT load and clean test data before and after each test ifneeded.Itis simpler to do so because some tests need empty data store(writetest) and because, as discussed in the document, tests mightnot betheonly users of the data store. Also IMHO, it is better thattestsload/clean data than doing some assumptions about the runningorderofthe tests. If we generalize this pattern to all IT tests, this will tendto gotothe direction of long running data store instances rather thandatastore instances started (and optionally loaded) before tests. Besides if we where to change our minds and load data fromoutsidethetests, a logstash script is provided. If you have any thoughts or remarks I'm all ears :) Regards, Etienne Le 14/12/2016 à 17:07, Jean-Baptiste Onofré a écrit :Hi Stephen, the purpose of having in a specific module is to shareresources andapply the same behavior from IT perspective and be able tohave IT"cross" IO (for instance, reading from JMS and sending toKafka, Ithink that's the key idea for integration tests). For instance, in Karaf, we have: - utest in each module - itest module containing itests for all modules all together Regards JB On 12/14/2016 04:59 PM, Stephen Sisk wrote:Hi Etienne, thanks for following up and answering my questions. re: where to store integration tests - having them all in aseparatemodule is an interesting idea. I couldn't find JB's comments aboutmovingtheminto a separate module in the PR - can you share the reasonsfordoing so? The IO integration/perf tests so it does seem like they'llneed tobetreated in a special manner, but given that there is alreadyan IOspecific module, it may just be that we need to treat all the ITs inthe IOmodule the same way. I don't have strong opinions either way rightnow.S On Wed, Dec 14, 2016 at 2:39 AM Etienne Chauchot <[email protected]>wrote: Hi guys, @Stephen: I addressed all your comments directly in the PR,thanks!I just wanted to comment here about the docker image I used:theonlyofficial Elastic image contains only ElasticSearch. But fortesting Ineeded logstash (for ingestion) and kibana (not forintegrationtests,but to easily test REST requests to ES using sense). This iswhy Iusean ELK (Elasticsearch+Logstash+Kibana) image. This oneisreleasedunder theapache 2 license. Besides, there is also a point about where to storeintegrationtests:JB proposed in the PR to store integration tests to dedicatedmodulerather than directly in the IO module (like I did). Etienne Le 01/12/2016 à 20:14, Stephen Sisk a écrit :hey! thanks for sending this. I'm very excited to see thischange. Iadded some detail-oriented code review comments in addition to whatI'vediscussed here. The general goal is to allow for re-usable instantiation ofparticulardatastore instances and this seems like a good start. Looks likeyoualso have a script to generate test data for your tests - that'sgreat.The next steps (definitely not blocking your work) will beto haveways to create instances from the docker images you have here, andusethemin the tests. We'll need support in the test framework for thatsinceit'llbe different on developer machines and in the beam jenkinscluster,butyour scripts here allow someone running these tests locally tonot havetoworryabout getting the instance set up and can manually adjust,so thisisa good incremental step. I have some thoughts now that I'm reviewing your scripts(that Ididn't have previously, so we are learning this together): * It may be useful to try and document why we chose aparticulardocker image as the base (ie, "this is the official supportedelasticsearchdocker image" or "this image has several data storestogether thatcan be used for a couple different tests") - I'm curious as towhetherthecommunity thinks that is important One thing that I called out in the comment that's worthmentioningon the larger list - if you want to specify which specific runnersa testuses, that can be controlled in the pom for the module. I updatedthetestingdocmentioned previously in this thread with a TODO to talkabout thismore. I think we should also make it so that IO modules have that automatically,sodevelopers don't have to worry about it. S On Thu, Dec 1, 2016 at 9:00 AM Etienne Chauchot <[email protected]>wrote:Stephen, As discussed, I added injection script, docker containersscriptsandintegration tests to the sdks/java/io/elasticsearch/contrib < https://github.com/apache/incubator-beam/pull/1439/files/1e7e2f0a6e1a1777d31ae2c886c920efccd708b5#diff-e243536428d06ade7 d824cefcb3ed0b9directory in that PR:https://github.com/apache/incubator-beam/pull/1439. These work well but they are first shot. Do you have anycommentsabout those? Besides I am not very sure that these files should be in theIOitself(even in contrib directory, out of maven sourcedirectories). Anythoughts?Thanks, Etienne Le 23/11/2016 à 19:03, Stephen Sisk a écrit :It's great to hear more experiences. I'm also glad to hear that people see real value in thehighvolume/performance benchmark tests. I tried to capture thatintheTestingdoc I shared, under "Reasons for Beam Test Strategy". [1]It does generally sound like we're in agreement here. AreasofdiscussionIsee: 1. People like the idea of bringing up fresh instances foreachtestrather than keeping instances running all the time, sincethatensures no contamination between tests. That seems reasonable to me.If weseeflakiness in the tests or we note that setting up/tearingdowninstancesistaking a lot of time, 2. Deciding on cluster management software/orchestrationsoftware- Iwantto make sure we land on the right tool here since choosingthewrong tool could result in administration of the instances taking morework. Isuspectthat's a good place for a follow up discussion, so I'llstart aseparate thread on that. I'm happy with whatever tool we choose, butIwanttomakesure we take a moment to consider different options and haveareason for choosing one. Etienne - thanks for being willing to port yourcreation/otherscripts over. You might be a good early tester of whether thissystemworkswell for everyone. Stephen [1] Reasons for Beam Test Strategy -https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit?ts=58349aec#On Wed, Nov 23, 2016 at 12:48 AM Jean-Baptiste Onofré <[email protected]> wrote: I second Etienne there.We worked together on the ElasticsearchIO and definitely,thehighvaluable test we did were integration tests with ES ondockerandhigh volume. I think we have to distinguish the two kinds of tests: 1. utests are located in the IO itself and basically theyshouldcover the core behaviors of the IO 2. itests are located as contrib in the IO (they could bepartofthe IO but executed by the integration-test plugin or a specificprofile)that deals with "real" backend and high volumes. The resourcesrequiredby the itest can be bootstrapped by Jenkins (for instanceusingMesos/Marathon and docker images as already discussed, andit'swhat I'm doing on my own "server"). It's basically what Stephen described. We have to not relay only on itest: utests are veryimportantandthey validate the core behavior. My $0.01 ;) Regards JB On 11/23/2016 09:27 AM, Etienne Chauchot wrote:Hi Stephen, I like your proposition very much and I also agree thatdocker+some orchestration software would be great ! On the elasticsearchIO (PR to be created this week) thereisdockercontainer creation scripts and logstash data ingestionscriptforIT environment available in contrib directory alongside with integration tests themselves. I'll be happy to make them compliant tonewITenvironment. What you say bellow about the need for external ITenvironmentisparticularly true. As an example with ES what came out infirstimplementation was that there were problems starting atsomehighvolumeof data (timeouts, ES windowing overflow...) that could nothavebeseenon embedded ES version. Also there where someparticularities toexternal instance like secondary (replica) shards thatwherenotvisibleon embedded instance.Besides, I also favor bringing up instances before testbecauseitallows (amongst other things) to be sure to start on afreshdatasetforthe test to be deterministic.Etienne Le 23/11/2016 à 02:00, Stephen Sisk a écrit :Hi, I'm excited we're getting lots of discussion going.There aremanythreads of conversation here, we may choose to split some ofthem offinto a different email thread. I'm also betting I missed someof thequestions in this thread, so apologies ahead of time for that. Alsoapologiesfortheamount of text, I provided some quick summaries at the topofeachsection. Amit - thanks for your thoughts. I've responded indetailbelow.Ismael - thanks for offering to help. There's plenty ofworkhere togoaround. I'll try and think about how we can divide up somenextsteps (probably in a separate thread.) The main next step Isee isdeciding between kubernetes/mesos+marathon/docker swarm - I'mworkingonthat,buthaving lots of different thoughts on what theadvantages/disadvantagesofthose are would be helpful (I'm not entirely sure of theprotocol for collaborating on sub-projects like this.) These issues are all related to what kind of tests wewant towrite. I think a kubernetes/mesos/swarm cluster could support alltheusecases we've discussed here (and thus should not block movingforwardwith this), but understanding what we want to test will help usunderstandhow the cluster will be used. I'm working on a proposed userguide fortestingIOTransforms, and I'm going to send out a link to that + ashortsummarytothe list shortly so folks can get a better sense of whereI'mcoming from. Here's my thinking on the questions we've raised here - Embedded versions of data stores for testing -------------------- Summary: yes! But we still need real data stores to testagainst.I am a gigantic fan of using embedded versions of thevariousdatastores. I think we should test everything we possibly can usingthem,and dothemajority of our correctness testing using embedded versions+ thedirectrunner. However, it's also important to have at least onetestthatactually connects to an actual instance, so we can getcoveragefor things like credentials, real connection strings, etc... The key point is that embedded versions definitely can'tcovertheperformance tests, so we need to host instances if wewant totestthat.I consider the integration tests/performance benchmarks tobecostly things that we do only for the IO transforms with large amountsofcommunity support/usage. A random IO transform used by a few usersdoesn'tnecessarily need integration & perf tests, but forheavilyusedIOtransforms, there's a lot of community value in thesetests.Themaintenance proposal below scales with the amount ofcommunitysupport for a particular IO transform. Reusing data stores ("use the data stores acrossexecutions.")------------------ Summary: I favor a hybrid approach: some frequentlyused, verysmall instances that we keep up all the time + largermulti-containerdata store instances that we spin up for perf tests. I don't think we need to have a strong answer to thisquestion,but I think we do need to know what range of capabilities we need,and usethat to inform our requirements on the hosting infrastructure. Ithinkkubernetes/mesos + docker can support all the scenariosIdiscussbelow.I had been thinking of a hybrid approach - reuse someinstancesanddon'treuse others. Some tests require isolation from othertests(eg.performance benchmarking), while others can easilyre-use thesamedatabase/data store instance over time, provided theyarewritten inthecorrect manner (eg. a simple read or write correctnessintegrationtests)To me, the question of whether to use one instance overtimeforatest vs spin up an instance for each test comes down to a tradeoffbetweenthesefactors:1. Flakiness of spin-up of an instance - if it's superflaky,we'll want to keep more instances up and running rather than bringthemup/down.(thismay also vary by the data store in question)2. Frequency of testing - if we are running tests every5minutes, itmaybe wasteful to bring machines up/down every time. If weruntests onceaday or week, it seems wasteful to keep the machines up thewholetime. 3. Isolation requirements - If tests must be isolated,itmeansweeitherhave to bring up the instances for each test, or we havetohavesome sort of signaling mechanism to indicate that a given instanceis inuse. I strongly favor bringing up an instance per test. 4. Number/size of containers - if we need a large numberofmachines for a particular test, keeping them running all the time willusemoreresources. The major unknown to me is how flaky it'll be to spintheseup.I'm hopeful/assuming they'll be pretty stable to bring up,but Ithink the best way to test that is to start doing it. I suspect the sweet spot is the following: have a set ofverysmalldatastore instances that stay up to support small-data-sizepost-commitend to end tests (post-commits run frequently and the data sizemeanstheinstances would not use many resources), combined withtheability to spin up larger instances for once a day/week performancebenchmarks(theseuseup more resources and are used less frequently.) That'sthe mixI'll propose in my docs on testing IO transforms. Ifspinning upnewinstances is cheap/non-flaky, I'd be fine with the idea ofspinning upinstances for each test. Management ("what's the overhead of managing such adeployment")-------------------- Summary: I propose that anyone can contribute scriptsforsetting updatastore instances + integration/perf tests, but if thecommunitydoesn't maintain a particular data store's tests, we disable thetestsandturn off the data store instances. Management of these instances is a crucial question.First,let'sbreakdown what tasks we'll need to do on a recurring basis:1. Ongoing maintenance (update to new versions, bothinstance&dependencies) - we don't want to have a lot of oldversionsvulnerabletoattacks/buggy2. Investigate breakages/regressions (I'm betting there will be more things we'll discover -let meknow if you have suggestions) There's a couple goals I see: 1. We should only do sys admin work for things that giveus alot of benefit. (ie, don't build IT/perf/data store set upscriptsfordata stores without a large community) 2. We should do as much as possible of testing via in-memory/embedded testing (as you brought up). 3. Reduce the amount of manual administration overhead As I discussed above, I think that integrationtests/performancebenchmarks are costly things that we should do only for the IOtransformswithlargeamounts of community support/usage. Thus, I propose thatwelimit theIOtransforms that get integration tests & performancebenchmarks tothosethat have community support for maintaining the data storeinstances. We can enforce this organically using some simple rules: 1. Investigating breakages/regressions: if a given integration/perfteststarts failing and no one investigates it within a setperiod oftime(aweek?), we disable the tests and shut off the data storeinstances ifwehave instances running. When someone wants to step up andsupport it again, they can fix the test, check it in, and re-enable thetest.2. Ongoing maintenance: every N months, file a jiraissue thatis just "is the IO Transform X data store up to date?" - if the jiraisnotresolved in a set period of time (1 month?), the perf/integrationtestsaredisabled,and the data store instances shut off.This is pretty flexible - * If a particular person or organization wants tosupport anIOtransform, they can. If a group of people all organically organizetokeepthetestsrunning, they can.* It can be mostly automated - there's not a lot ofcentralorganizing work that needs to be done. Exposing the information about what IO transformscurrentlyhaverunningIT/perf benchmarks on the website will let users know whatIOtransformsare well supported.I like this solution, but I also recognize this is atrickyproblem.Thisis something the community needs to be supportive of, soI'mopen to other thoughts. Simulating failures in real nodes ("programmatic teststosimulatefailure") ----------------- Summary: 1) Focus our testing on the code in Beam 2) Weshouldencourage a design pattern separating out network/retry logic fromthemainIOtransform logic
-- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
