Re: [DISCUSSION] Ignite integration testing framework.

Anton Vinogradov Thu, 09 Jul 2020 06:58:17 -0700

> Have you had a chance to deploy ducktests in bare metal?
Working on servers obtaining.


On Thu, Jul 9, 2020 at 10:11 AM Max Shonichev <mshon...@yandex.ru> wrote:

> Anton,
>
> well, strange thing, but clean up and rerun helped.
>
>
> Ubuntu 18.04
>
>
> ====================================================================================================
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--003
> run time:         4 minutes 44.835 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
>
> ====================================================================================================
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> status:     PASS
> run time:   41.927 seconds
> {"Rebalanced in (sec)": 1.02205491065979}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
> status:     PASS
> run time:   51.985 seconds
> {"Rebalanced in (sec)": 0.0760810375213623}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> status:     PASS
> run time:   1 minute 4.283 seconds
> {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst
> latency (ms)": "31035"}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
> status:     PASS
> run time:   1 minute 13.089 seconds
> {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst
> latency (ms)": "139"}
>
> ----------------------------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
> status:     PASS
> run time:   53.332 seconds
>
> ----------------------------------------------------------------------------------------------------
>
>
> MacBook
>
> ================================================================================
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.7.7
> session_id:       2020-07-06--001
> run time:         6 minutes 58.612 seconds
> tests run:        5
> passed:           5
> failed:           0
> ignored:          0
>
> ================================================================================
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> status:     PASS
> run time:   48.724 seconds
> {"Rebalanced in (sec)": 3.2574470043182373}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
> status:     PASS
> run time:   1 minute 23.210 seconds
> {"Rebalanced in (sec)": 2.165921211242676}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> status:     PASS
> run time:   1 minute 12.659 seconds
> {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency
> (ms)": "31063"}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
> status:     PASS
> run time:   1 minute 57.257 seconds
> {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst
> latency (ms)": "1010"}
>
> --------------------------------------------------------------------------------
> test_id:
>
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
> status:     PASS
> run time:   1 minute 36.317 seconds
>
> =============
>
> while relative numbers proportion remains the same for different Ignite
> versions, absolute number for mac/linux differ more then twice.
>
> I'm finalizing code with 'local Tiden' appliance for your tests.  PR
> would be ready soon.
>
> Have you had a chance to deploy ducktests in bare metal?
>
>
>
> On 06.07.2020 14:27, Anton Vinogradov wrote:
> > Max,
> >
> > Thanks for the check!
> >
> >> Is it OK for those tests to fail?
> > No.
> > I see really strange things at logs.
> > Looks like you have concurrent ducktests run started not expected
> services,
> > and this broke the tests.
> > Could you please clean up the docker (use clean-up script [1]).
> > Compile sources (use script [2]) and rerun the tests.
> >
> > [1]
> >
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> > [2]
> >
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
> >
> > On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org>
> wrote:
> >
> >> Hello, Maxim.
> >>
> >> Thanks for writing down the minutes.
> >>
> >> There is no such thing as «Nikolay team» on the dev-list.
> >> I propose to focus on product requirements and what we want to gain from
> >> the framework instead of taking into account the needs of some team.
> >>
> >> Can you, please, write down your version of requirements so we can
> reach a
> >> consensus on that and therefore move to the discussion of the
> >> implementation?
> >>
> >>> 6 июля 2020 г., в 11:18, Max Shonichev <mshon...@yandex.ru>
> написал(а):
> >>>
> >>> Yes, Denis,
> >>>
> >>> common ground seems to be as follows:
> >>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> >> over physical hosts and share benchmark results. In the meantime, while
> I
> >> strongly believe that dockerized approach to benchmarking is a road to
> >> misleading and false positives, I'll prepare a PoC of Tiden in
> dockerized
> >> environment to support 'fast development prototyping' usecase Nikolay
> team
> >> insist on. It should be a matter of few days.
> >>>
> >>> As a side note, I've run Anton PoC locally and would like to have some
> >> comments about results:
> >>>
> >>> Test system: Ubuntu 18.04, docker 19.03.6
> >>> Test commands:
> >>>
> >>>
> >>> git clone -b ignite-ducktape g...@github.com:
> anton-vinogradov/ignite.git
> >>> cd ignite
> >>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> >> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> >>> cd modules/ducktests/tests/docker
> >>> ./run_tests.sh
> >>>
> >>> Test results:
> >>>
> >>
> ====================================================================================================
> >>> SESSION REPORT (ALL TESTS)
> >>> ducktape version: 0.7.7
> >>> session_id:       2020-07-05--004
> >>> run time:         7 minutes 36.360 seconds
> >>> tests run:        5
> >>> passed:           3
> >>> failed:           2
> >>> ignored:          0
> >>>
> >>
> ====================================================================================================
> >>> test_id:
> >>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
> >>> status:     FAIL
> >>> run time:   3 minutes 12.232 seconds
> >>>
> >>
> ----------------------------------------------------------------------------------------------------
> >>> test_id:
> >>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
> >>> status:     FAIL
> >>> run time:   1 minute 33.076 seconds
> >>>
> >>>
> >>> Is it OK for those tests to fail? Attached is full test report
> >>>
> >>>
> >>> On 02.07.2020 17:46, Denis Magda wrote:
> >>>> Folks,
> >>>> Please share the summary of that Slack conversation here for records
> >> once
> >>>> you find common ground.
> >>>> -
> >>>> Denis
> >>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <nizhi...@apache.org>
> >> wrote:
> >>>>> Igniters.
> >>>>>
> >>>>> All who are interested in integration testing framework discussion
> are
> >>>>> welcome into slack channel -
> >>>>>
> >>
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> >>>>>
> >>>>>
> >>>>>
> >>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <a...@apache.org>
> написал(а):
> >>>>>>
> >>>>>> Max,
> >>>>>> Thanks for joining us.
> >>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>>> dependencies being deployed by external scripts.
> >>>>>> No. It is important to distinguish development, deploy, and
> >>>>> orchestration.
> >>>>>> All-in-one solutions have extremely limited usability.
> >>>>>> As to Ducktests:
> >>>>>> Docker is responsible for deployments during development.
> >>>>>> CI/CD is responsible for deployments during release and nightly
> >> checks.
> >>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> >>>>>> Ducktape is responsible for orchestration.
> >>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion,
> >>>>>>> while ducktape internally does all actions sequentially.
> >>>>>> No. Ducktape may start any service in parallel. See Pme-free
> benchmark
> >>>>> [1] for details.
> >>>>>>
> >>>>>>> if we used ducktape solution we would have to instead prepare some
> >>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >> with
> >>>>>>> Ansible or Chef.
> >>>>>> Sure, because a way of deploy depends on infrastructure.
> >>>>>> How can we be sure that OS we use and the restrictions we have will
> be
> >>>>> compatible with Tiden?
> >>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >> dependencies
> >>>>>>> into one uber-image ...
> >>>>>> and
> >>>>>>> I guess we all know about docker hyped ability to run over
> >> distributed
> >>>>>>> virtual networks.
> >>>>>> It is very important not to confuse the test's development (docker
> >> image
> >>>>> you're talking about) and real deployment.
> >>>>>>
> >>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> >>>>>> All actions can be performed in parallel.
> >>>>>> See how Ducktests [2] starts cluster in parallel for example.
> >>>>>>
> >>>>>> [1]
> >>>>>
> >>
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> >>>>>> [2]
> >>>>>
> >>
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >>>>>>
> >>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <nizhi...@apache.org
> >
> >>>>> wrote:
> >>>>>> Hello, Maxim.
> >>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>> dependencies being deployed by external scripts
> >>>>>>
> >>>>>> Why do you think that maintaining deploy scripts coupled with the
> >>>>> testing framework is an advantage?
> >>>>>> I thought we want to see and maintain deployment scripts separate
> from
> >>>>> the testing framework.
> >>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>
> >>>>>> Can you, please, clarify, what actions do you have in mind?
> >>>>>> And why we want to execute them concurrently?
> >>>>>> Ignite node start, Client application execution can be done
> >> concurrently
> >>>>> with the ducktape approach.
> >>>>>>
> >>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> >>>>> Ansible or Chef
> >>>>>>
> >>>>>> We shouldn’t take some user approach as an argument in this
> >> discussion.
> >>>>> Let’s discuss a general approach for all users of the Ignite. Anyway,
> >> what
> >>>>> is wrong with the external deployment script approach?
> >>>>>>
> >>>>>> We, as a community, should provide several ways to run integration
> >> tests
> >>>>> out-of-the-box AND the ability to customize deployment regarding the
> >> user
> >>>>> landscape.
> >>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >>>>> dependencies into one uber-image and that looks like simple and
> elegant
> >>>>> solution however, that effectively limits you to single-host testing.
> >>>>>>
> >>>>>> Docker image should be used only by the Ignite developers to test
> >>>>> something locally.
> >>>>>> It’s not intended for some real-world testing.
> >>>>>>
> >>>>>> The main issue with the Tiden that I see, it tested and maintained
> as
> >> a
> >>>>> closed source solution.
> >>>>>> This can lead to the hard to solve problems when we start using and
> >>>>> maintaining it as an open-source solution.
> >>>>>> Like, how many developers used Tiden? And how many of developers
> were
> >>>>> not authors of the Tiden itself?
> >>>>>>
> >>>>>>
> >>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <mshon...@yandex.ru>
> >>>>> написал(а):
> >>>>>>>
> >>>>>>> Anton, Nikolay,
> >>>>>>>
> >>>>>>> Let's agree on what we are arguing about: whether it is about "like
> >> or
> >>>>> don't like" or about technical properties of suggested solutions.
> >>>>>>>
> >>>>>>> If it is about likes and dislikes, then the whole discussion is
> >>>>> meaningless. However, I hope together we can analyse pros and cons
> >>>>> carefully.
> >>>>>>>
> >>>>>>> As far as I can understand now, two main differences between
> ducktape
> >>>>> and tiden is that:
> >>>>>>>
> >>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>> dependencies being deployed by external scripts.
> >>>>>>>
> >>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>
> >>>>>>> As for me, these are very important properties for distributed
> >> testing
> >>>>> framework.
> >>>>>>>
> >>>>>>> First property let us easily reuse tiden in existing
> infrastructures,
> >>>>> for example, during Zookeeper IEP testing at Sberbank site we used
> the
> >> same
> >>>>> tiden scripts that we use in our lab, the only change was putting a
> >> list of
> >>>>> hosts into config.
> >>>>>>>
> >>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> with
> >>>>> Ansible or Chef.
> >>>>>>>
> >>>>>>>
> >>>>>>> You have solved this deficiency with docker by putting all
> >>>>> dependencies into one uber-image and that looks like simple and
> elegant
> >>>>> solution,
> >>>>>>> however, that effectively limits you to single-host testing.
> >>>>>>>
> >>>>>>> I guess we all know about docker hyped ability to run over
> >> distributed
> >>>>> virtual networks. We used to go that way, but quickly found that it
> is
> >> more
> >>>>> of the hype than real work. In real environments, there are problems
> >> with
> >>>>> routing, DNS, multicast and broadcast traffic, and many others, that
> >> turn
> >>>>> docker-based distributed solution into a fragile hard-to-maintain
> >> monster.
> >>>>>>>
> >>>>>>> Please, if you believe otherwise, perform a run of your PoC over at
> >>>>> least two physical hosts and share results with us.
> >>>>>>>
> >>>>>>> If you consider that one physical docker host is enough, please,
> >> don't
> >>>>> overlook that we want to run real scale scenarios, with 50-100 cache
> >>>>> groups, persistence enabled and a millions of keys loaded.
> >>>>>>>
> >>>>>>> Practical limit for such configurations is 4-6 nodes per single
> >>>>> physical host. Otherwise, tests become flaky due to resource
> >> starvation.
> >>>>>>>
> >>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> >>>>> your PoC with other tests running at TC (we're targeting TeamCity,
> >> right?)
> >>>>> and share results so we could check if the numbers are reproducible.
> >>>>>>>
> >>>>>>> I stress this once more: functional integration tests are OK to run
> >> in
> >>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >>>>>>>
> >>>>>>>
> >>>>>>> Second property let us write tests that require real-parallel
> actions
> >>>>> over hosts.
> >>>>>>>
> >>>>>>> For example, agreed scenario for PME benchmarkduring "PME
> >> optimization
> >>>>> stream" was as follows:
> >>>>>>>
> >>>>>>>   - 10 server nodes, preloaded with 1M of keys
> >>>>>>>   - 4 client nodes perform transactional load  (client nodes
> >> physically
> >>>>> separated from server nodes)
> >>>>>>>   - during load:
> >>>>>>>   -- 5 server nodes stopped in parallel
> >>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
> >>>>>>>   - load stopped, logs are analysed for exchange times.
> >>>>>>>
> >>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does,
> >>>>> then partition map exchange merge would not happen and we could not
> >> have
> >>>>> measured PME optimizations for that case.
> >>>>>>>
> >>>>>>>
> >>>>>>> These are limitations of ducktape that we believe as a more
> important
> >>>>>>> argument "against" than you provide "for".
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> >>>>>>>> Folks,
> >>>>>>>> First, I've created PR [1] with ducktests improvements
> >>>>>>>> PR contains the following changes
> >>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> >> 2.8)
> >>>>>>>> - Global refactoring
> >>>>>>>> -- benchmarks javacode simplification
> >>>>>>>> -- services python and java classes code deduplication
> >>>>>>>> -- fail-fast checks for java and python (eg. application should
> >>>>> explicitly write it finished with success)
> >>>>>>>> -- simple results extraction from tests and benchmarks
> >>>>>>>> -- javacode now configurable from tests/benchmarks
> >>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> >>>>> operation and log results)
> >>>>>>>> -- docker volume now marked as delegated to increase execution
> speed
> >>>>> for mac & win users
> >>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
> >>>>>>>> -- Ignite can be configured at test/benchmark
> >>>>>>>> - full and module assembly scripts added
> >>>>>>> Great job done! But let me remind one of Apache Ignite principles:
> >>>>>>> week of thinking save months of development.
> >>>>>>>
> >>>>>>>
> >>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> >>>>> integration) as a target "PoC check & real topology benchmarking
> tool".
> >>>>>>>> Ducktape pros
> >>>>>>>> - Developed for distributed system by distributed system
> developers.
> >>>>>>> So does Tiden
> >>>>>>>
> >>>>>>>> - Developed since 2014, stable.
> >>>>>>> Tiden is also pretty stable, and development start date is not a
> good
> >>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> >>>>> distributed testing) is since 2010, but we don't see it as a
> >> alternative at
> >>>>> all.
> >>>>>>>
> >>>>>>>> - Proven usability by usage at Kafka.
> >>>>>>> Tiden is proven usable by usage at GridGain and Sberbank
> deployments.
> >>>>>>> Core, storage, sql and tx teams use benchmark results provided by
> >>>>> Tiden on a daily basis.
> >>>>>>>
> >>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great
> example
> >>>>> pack.
> >>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
> >>>>> previous letter.
> >>>>>>>
> >>>>>>>> - Built-in Docker support for rapid development and checks.
> >>>>>>> False, there's no specific 'docker support' in ducktape itself, you
> >>>>> just wrap it in docker by yourself, because ducktape is lacking
> >> deployment
> >>>>> abilities.
> >>>>>>>
> >>>>>>>> - Great for CI automation.
> >>>>>>> False, there's no specific CI-enabled features in ducktape. Tiden,
> on
> >>>>> the other hand, provide generic xUnit reporting format, which is
> >> supported
> >>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
> >> Tiden
> >>>>> can use SSH agent, which is also great for CI, because both
> >>>>>>> TeamCity and Jenkins store keys in secret storage available only
> for
> >>>>> ssh-agent and only for the time of the test.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> As an additional motivation, at least 3 teams
> >>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> >> Zabbix
> >>>>> speed-up))
> >>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
> >>>>> slow-down or broke AI features)
> >>>>>>>> - Ignite SE QA team (to append already developed
> smoke/load/failover
> >>>>> tests to AI codebase)
> >>>>>>>
> >>>>>>> Please, before recommending your tests to other teams, provide
> proofs
> >>>>>>> that your tests are reproducible in real environment.
> >>>>>>>
> >>>>>>>
> >>>>>>>> now, wait for ducktest merge to start checking cases they working
> on
> >>>>> in AI way.
> >>>>>>>> Thoughts?
> >>>>>>> Let us together review both solutions, we'll try to run your tests
> in
> >>>>> our lab, and you'll try to at least checkout tiden and see if same
> >> tests
> >>>>> can be implemented with it?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> [1] https://github.com/apache/ignite/pull/7967
> >>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> >> nizhi...@apache.org
> >>>>> <mailto:nizhi...@apache.org>> wrote:
> >>>>>>>>     Hello, Maxim.
> >>>>>>>>     Thank you for so detailed explanation.
> >>>>>>>>     Can we put the content of this discussion somewhere on the
> wiki?
> >>>>>>>>     So It doesn’t get lost.
> >>>>>>>>     I divide the answer in several parts. From the requirements to
> >> the
> >>>>>>>>     implementation.
> >>>>>>>>     So, if we agreed on the requirements we can proceed with the
> >>>>>>>>     discussion of the implementation.
> >>>>>>>>     1. Requirements:
> >>>>>>>>     The main goal I want to achieve is *reproducibility* of the
> >> tests.
> >>>>>>>>     I’m sick and tired with the zillions of flaky, rarely failed,
> and
> >>>>>>>>     almost never failed tests in Ignite codebase.
> >>>>>>>>     We should start with the simplest scenarios that will be as
> >>>>> reliable
> >>>>>>>>     as steel :)
> >>>>>>>>     I want to know for sure:
> >>>>>>>>        - Is this PR makes rebalance quicker or not?
> >>>>>>>>        - Is this PR makes PME quicker or not?
> >>>>>>>>     So, your description of the complex test scenario looks as a
> next
> >>>>>>>>     step to me.
> >>>>>>>>     Anyway, It’s cool we already have one.
> >>>>>>>>     The second goal is to have a strict test lifecycle as we have
> in
> >>>>>>>>     JUnit and similar frameworks.
> >>>>>>>>      > It covers production-like deployment and running a
> scenarios
> >>>>> over
> >>>>>>>>     a single database instance.
> >>>>>>>>     Do you mean «single cluster» or «single host»?
> >>>>>>>>     2. Existing tests:
> >>>>>>>>      > A Combinator suite allows to run set of operations
> >> concurrently
> >>>>>>>>     over given database instance.
> >>>>>>>>      > A Consumption suite allows to run a set production-like
> >> actions
> >>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >> metrics
> >>>>>>>>     across versions
> >>>>>>>>      > A Yardstick suite
> >>>>>>>>      > A Stress suite that simulates hardware environment
> degradation
> >>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>> functional
> >>>>>>>>     regression testing
> >>>>>>>>      > Regression
> >>>>>>>>     Great news that we already have so many choices for testing!
> >>>>>>>>     Mature test base is a big +1 for Tiden.
> >>>>>>>>     3. Comparison:
> >>>>>>>>      > Criteria: Test configuration
> >>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>      > Tiden: any number of YaML config files, command line option
> >> for
> >>>>>>>>     fine-grained test configuration, ability to select/modify
> tests
> >>>>>>>>     behavior based on Ignite version.
> >>>>>>>>     1. Many YAML files can be hard to maintain.
> >>>>>>>>     2. In ducktape, you can set parameters via «—parameters»
> option.
> >>>>>>>>     Please, take a look at the doc [1]
> >>>>>>>>      > Criteria: Cluster control
> >>>>>>>>      > Tiden: additionally can address cluster as a whole and
> execute
> >>>>>>>>     remote commands in parallel.
> >>>>>>>>     It seems we implement this ability in the PoC, already.
> >>>>>>>>      > Criteria: Test assertions
> >>>>>>>>      > Tiden: simple asserts, also few customized assertion
> helpers.
> >>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>     Can you, please, be more specific.
> >>>>>>>>     What helpers do you have in mind?
> >>>>>>>>     Ducktape has an asserts that waits for logfile messages or
> some
> >>>>>>>>     process finish.
> >>>>>>>>      > Criteria: Test reporting
> >>>>>>>>      > Ducktape: limited to its own text/HTML format
> >>>>>>>>     Ducktape have
> >>>>>>>>     1. Text reporter
> >>>>>>>>     2. Customizable HTML reporter
> >>>>>>>>     3. JSON reporter.
> >>>>>>>>     We can show JSON with the any template or tool.
> >>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> test
> >>>>>>>>     needs. However, that means, that test can’t be scaled without
> >> test
> >>>>>>>>     code changes. Does not do any deploy, relies on external
> means,
> >>>>> e.g.
> >>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>     This is not true.
> >>>>>>>>     1. We can set explicit test parameters(node number) via
> >> parameters.
> >>>>>>>>     We can increase client count of cluster size without test code
> >>>>> changes.
> >>>>>>>>     2. We have many choices for the test environment. These
> choices
> >> are
> >>>>>>>>     tested and used in other projects:
> >>>>>>>>              * docker
> >>>>>>>>              * vagrant
> >>>>>>>>              * private cloud(ssh access)
> >>>>>>>>              * ec2
> >>>>>>>>     Please, take a look at Kafka documentation [2]
> >>>>>>>>      > I can continue more on this, but it should be enough for
> now:
> >>>>>>>>     We need to go deeper! :)
> >>>>>>>>     [1]
> >>>>>>>>
> >>>>>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>>>>>>>     [2]
> >>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev <
> mshon...@yandex.ru
> >>>>>>>>     <mailto:mshon...@yandex.ru>> написал(а):
> >>>>>>>>      >
> >>>>>>>>      > Greetings, Nikolay,
> >>>>>>>>      >
> >>>>>>>>      > First of all, thank you for you great effort preparing PoC
> of
> >>>>>>>>     integration testing to Ignite community.
> >>>>>>>>      >
> >>>>>>>>      > It’s a shame Ignite did not have at least some such tests
> yet,
> >>>>>>>>     however, GridGain, as a major contributor to Apache Ignite
> had a
> >>>>>>>>     profound collection of in-house tools to perform integration
> and
> >>>>>>>>     performance testing for years already and while we slowly
> >> consider
> >>>>>>>>     sharing our expertise with the community, your initiative
> makes
> >> us
> >>>>>>>>     drive that process a bit faster, thanks a lot!
> >>>>>>>>      >
> >>>>>>>>      > I reviewed your PoC and want to share a little about what
> we
> >> do
> >>>>>>>>     on our part, why and how, hope it would help community take
> >> proper
> >>>>>>>>     course.
> >>>>>>>>      >
> >>>>>>>>      > First I’ll do a brief overview of what decisions we made
> and
> >>>>> what
> >>>>>>>>     we do have in our private code base, next I’ll describe what
> we
> >>>>> have
> >>>>>>>>     already donated to the public and what we plan public next,
> then
> >>>>>>>>     I’ll compare both approaches highlighting deficiencies in
> order
> >> to
> >>>>>>>>     spur public discussion on the matter.
> >>>>>>>>      >
> >>>>>>>>      > It might seem strange to use Python to run Bash to run Java
> >>>>>>>>     applications because that introduces IT industry best of
> breed’ –
> >>>>>>>>     the Python dependency hell – to the Java application code
> base.
> >> The
> >>>>>>>>     only strangest decision one can made is to use Maven to run
> >> Docker
> >>>>>>>>     to run Bash to run Python to run Bash to run Java, but
> desperate
> >>>>>>>>     times call for desperate measures I guess.
> >>>>>>>>      >
> >>>>>>>>      > There are Java-based solutions for integration testing
> exists,
> >>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they might
> go
> >>>>> well
> >>>>>>>>     for Ignite community CI pipelines by them selves. But we also
> >>>>> wanted
> >>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
> >>>>>>>>     benchmark, and this is solved by totally different set of
> tools
> >> in
> >>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>>>>>>>      >
> >>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
> >>>>> community
> >>>>>>>>     already has Yardstick [6], and there’s nothing wrong with
> writing
> >>>>>>>>     PME benchmark using Yardstick, but we also wanted to be able
> to
> >> run
> >>>>>>>>     scenarios like this:
> >>>>>>>>      > - put an X load to a Ignite database;
> >>>>>>>>      > - perform an Y set of operations to check how Ignite copes
> >> with
> >>>>>>>>     operations under load.
> >>>>>>>>      >
> >>>>>>>>      > And yes, we also wanted applications under test be deployed
> >>>>> ‘like
> >>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
> >> arises
> >>>>>>>>     questions about provisioning and nodes affinity which I’ll
> cover
> >> in
> >>>>>>>>     detail later.
> >>>>>>>>      >
> >>>>>>>>      > So we decided to put a little effort to build a simple
> tool to
> >>>>>>>>     cover different integration and performance scenarios, and
> our QA
> >>>>>>>>     lab first attempt was PoC-Tester [7], currently open source
> for
> >> all
> >>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
> >> Java-based
> >>>>>>>>     tool targeted to be run on a pre-release QA stage.
> >>>>>>>>      >
> >>>>>>>>      > It covers production-like deployment and running a
> scenarios
> >>>>> over
> >>>>>>>>     a single database instance. PoC-Tester scenarios consists of a
> >>>>>>>>     sequence of tasks running sequentially or in parallel. After
> all
> >>>>>>>>     tasks complete, or at any time during test, user can run logs
> >>>>>>>>     collection task, logs are checked against exceptions and a
> >> summary
> >>>>>>>>     of found issues and task ops/latency statistics is generated
> at
> >> the
> >>>>>>>>     end of scenario. One of the main PoC-Tester features is its
> >>>>>>>>     fire-and-forget approach to task managing. That is, you can
> >> deploy
> >>>>> a
> >>>>>>>>     grid and left it running for weeks, periodically firing some
> >> tasks
> >>>>>>>>     onto it.
> >>>>>>>>      >
> >>>>>>>>      > During earliest stages of PoC-Tester development it becomes
> >>>>> quite
> >>>>>>>>     clear that Java application development is a tedious process
> and
> >>>>>>>>     architecture decisions you take during development are slow
> and
> >>>>> hard
> >>>>>>>>     to change.
> >>>>>>>>      > For example, scenarios like this
> >>>>>>>>      > - deploy two instances of GridGain with master-slave data
> >>>>>>>>     replication configured;
> >>>>>>>>      > - put a load on master;
> >>>>>>>>      > - perform checks on slave,
> >>>>>>>>      > or like this:
> >>>>>>>>      > - preload a 1Tb of data by using your favorite tool of
> choice
> >> to
> >>>>>>>>     an Apache Ignite of version X;
> >>>>>>>>      > - run a set of functional tests running Apache Ignite
> version
> >> Y
> >>>>>>>>     over preloaded data,
> >>>>>>>>      > do not fit well in the PoC-Tester workflow.
> >>>>>>>>      >
> >>>>>>>>      > So, this is why we decided to use Python as a generic
> >> scripting
> >>>>>>>>     language of choice.
> >>>>>>>>      >
> >>>>>>>>      > Pros:
> >>>>>>>>      > - quicker prototyping and development cycles
> >>>>>>>>      > - easier to find DevOps/QA engineer with Python skills than
> >> one
> >>>>>>>>     with Java skills
> >>>>>>>>      > - used extensively all over the world for DevOps/CI
> pipelines
> >>>>> and
> >>>>>>>>     thus has rich set of libraries for all possible integration
> uses
> >>>>> cases.
> >>>>>>>>      >
> >>>>>>>>      > Cons:
> >>>>>>>>      > - Nightmare with dependencies. Better stick to specific
> >>>>>>>>     language/libraries version.
> >>>>>>>>      >
> >>>>>>>>      > Comparing alternatives for Python-based testing framework
> we
> >>>>> have
> >>>>>>>>     considered following requirements, somewhat similar to what
> >> you’ve
> >>>>>>>>     mentioned for Confluent [8] previously:
> >>>>>>>>      > - should be able run locally or distributed (bare metal or
> in
> >>>>> the
> >>>>>>>>     cloud)
> >>>>>>>>      > - should have built-in deployment facilities for
> applications
> >>>>>>>>     under test
> >>>>>>>>      > - should separate test configuration and test code
> >>>>>>>>      > -- be able to easily reconfigure tests by simple
> configuration
> >>>>>>>>     changes
> >>>>>>>>      > -- be able to easily scale test environment by simple
> >>>>>>>>     configuration changes
> >>>>>>>>      > -- be able to perform regression testing by simple
> switching
> >>>>>>>>     artifacts under test via configuration
> >>>>>>>>      > -- be able to run tests with different JDK version by
> simple
> >>>>>>>>     configuration changes
> >>>>>>>>      > - should have human readable reports and/or reporting tools
> >>>>>>>>     integration
> >>>>>>>>      > - should allow simple test progress monitoring, one does
> not
> >>>>> want
> >>>>>>>>     to run 6-hours test to find out that application actually
> crashed
> >>>>>>>>     during first hour.
> >>>>>>>>      > - should allow parallel execution of test actions
> >>>>>>>>      > - should have clean API for test writers
> >>>>>>>>      > -- clean API for distributed remote commands execution
> >>>>>>>>      > -- clean API for deployed applications start / stop and
> other
> >>>>>>>>     operations
> >>>>>>>>      > -- clean API for performing check on results
> >>>>>>>>      > - should be open source or at least source code should
> allow
> >>>>> ease
> >>>>>>>>     change or extension
> >>>>>>>>      >
> >>>>>>>>      > Back at that time we found no better alternative than to
> write
> >>>>>>>>     our own framework, and here goes Tiden [9] as GridGain
> framework
> >> of
> >>>>>>>>     choice for functional integration and performance testing.
> >>>>>>>>      >
> >>>>>>>>      > Pros:
> >>>>>>>>      > - solves all the requirements above
> >>>>>>>>      > Cons (for Ignite):
> >>>>>>>>      > - (currently) closed GridGain source
> >>>>>>>>      >
> >>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
> >> which
> >>>>>>>>     you might have heard already.
> >>>>>>>>      >
> >>>>>>>>      > A Combinator suite allows to run set of operations
> >> concurrently
> >>>>>>>>     over given database instance. Proven to find at least 30+ race
> >>>>>>>>     conditions and NPE issues.
> >>>>>>>>      >
> >>>>>>>>      > A Consumption suite allows to run a set production-like
> >> actions
> >>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >> metrics
> >>>>>>>>     across versions, like heap/disk/CPU consumption, time to
> perform
> >>>>>>>>     actions, like client PME, server PME, rebalancing time, data
> >>>>>>>>     replication time, etc.
> >>>>>>>>      >
> >>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to
> run
> >>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself
> has a
> >>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
> >>>>>>>>      >
> >>>>>>>>      > A Stress suite that simulates hardware environment
> degradation
> >>>>>>>>     during testing.
> >>>>>>>>      >
> >>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>> functional
> >>>>>>>>     regression testing of GridGain Ultimate Edition features like
> >>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
> >>>>>>>>      >
> >>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
> >> IEP-15,
> >>>>>>>>     etc, etc, etc.
> >>>>>>>>      >
> >>>>>>>>      > Most of the suites above use another in-house developed
> Java
> >>>>> tool
> >>>>>>>>     – PiClient – to perform actual loading and miscellaneous
> >> operations
> >>>>>>>>     with Ignite under test. We use py4j Python-Java gateway
> library
> >> to
> >>>>>>>>     control PiClient instances from the tests.
> >>>>>>>>      >
> >>>>>>>>      > When we considered CI, we put TeamCity out of scope,
> because
> >>>>>>>>     distributed integration and performance tests tend to run for
> >> hours
> >>>>>>>>     and TeamCity agents are scarce and costly resource. So,
> bundled
> >>>>> with
> >>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines and
> >>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward
> aggregates
> >>>>> test
> >>>>>>>>     run reports across versions and has built in visualization
> >> support
> >>>>>>>>     for Combinator suite.
> >>>>>>>>      >
> >>>>>>>>      > All of the above is currently closed source, but we plan to
> >> make
> >>>>>>>>     it public for community, and publishing Tiden core [9] is the
> >> first
> >>>>>>>>     step on that way. You can review some examples of using Tiden
> for
> >>>>>>>>     tests at my repository [11], for start.
> >>>>>>>>      >
> >>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Language
> >>>>>>>>      > Tiden: Python, 3.7
> >>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>>>>>>>     compatible, but actually can’t work with Python 3.7 due to
> broken
> >>>>>>>>     Zmq dependency.
> >>>>>>>>      > Comment: Python 3.7 has a much better support for
> async-style
> >>>>>>>>     code which might be crucial for distributed application
> testing.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 0
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test writers API
> >>>>>>>>      > Supported integration test framework concepts are basically
> >> the
> >>>>> same:
> >>>>>>>>      > - a test controller (test runner)
> >>>>>>>>      > - a cluster
> >>>>>>>>      > - a node
> >>>>>>>>      > - an application (a service in Ducktape terms)
> >>>>>>>>      > - a test
> >>>>>>>>      > Score: Tiden: 5, Ducktape: 5
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Tests selection and run
> >>>>>>>>      > Ducktape: suite-package-class-method level selection,
> internal
> >>>>>>>>     scheduler allows to run tests in suite in parallel.
> >>>>>>>>      > Tiden: also suite-package-class-method level selection,
> >>>>>>>>     additionally allows selecting subset of tests by attribute,
> >>>>> parallel
> >>>>>>>>     runs not built in, but allows merging test reports after
> >> different
> >>>>> runs.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 2
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test configuration
> >>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>      > Tiden: any number of YaML config files, command line option
> >> for
> >>>>>>>>     fine-grained test configuration, ability to select/modify
> tests
> >>>>>>>>     behavior based on Ignite version.
> >>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Cluster control
> >>>>>>>>      > Ducktape: allow execute remote commands by node granularity
> >>>>>>>>      > Tiden: additionally can address cluster as a whole and
> execute
> >>>>>>>>     remote commands in parallel.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Logs control
> >>>>>>>>      > Both frameworks have similar builtin support for remote
> logs
> >>>>>>>>     collection and grepping. Tiden has built-in plugin that can
> zip,
> >>>>>>>>     collect arbitrary log files from arbitrary locations at
> >>>>>>>>     test/module/suite granularity and unzip if needed, also
> >> application
> >>>>>>>>     API to search / wait for messages in logs. Ducktape allows
> each
> >>>>>>>>     service declare its log files location (seemingly does not
> >> support
> >>>>>>>>     logs rollback), and a single entrypoint to collect service
> logs.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test assertions
> >>>>>>>>      > Tiden: simple asserts, also few customized assertion
> helpers.
> >>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Test reporting
> >>>>>>>>      > Ducktape: limited to its own text/html format
> >>>>>>>>      > Tiden: provides text report, yaml report for reporting
> tools
> >>>>>>>>     integration, XML xUnit report for integration with
> >>>>> Jenkins/TeamCity.
> >>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> test
> >>>>>>>>     needs. However, that means, that test can’t be scaled without
> >> test
> >>>>>>>>     code changes. Does not do any deploy, relies on external
> means,
> >>>>> e.g.
> >>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for the
> >>>>> test.
> >>>>>>>>     Provisioning should be done by external means. However,
> provides
> >> a
> >>>>>>>>     conventional automated deployment routines.
> >>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > Criteria: Documentation and Extensibility
> >>>>>>>>      > Tiden: current API documentation is limited, should change
> as
> >> we
> >>>>>>>>     go open source. Tiden is easily extensible via hooks and
> plugins,
> >>>>>>>>     see example Maven plugin and Gatling application at [11].
> >>>>>>>>      > Ducktape: basic documentation at readthedocs.io
> >>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core is
> >>>>>>>>     tightly coupled and hard to change. The only possible
> extension
> >>>>>>>>     mechanism is fork-and-rewrite.
> >>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>      >
> >>>>>>>>      > I can continue more on this, but it should be enough for
> now:
> >>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
> >>>>>>>>      >
> >>>>>>>>      > Time for discussion!
> >>>>>>>>      >
> >>>>>>>>      > ---
> >>>>>>>>      > [1] - https://www.testcontainers.org/
> >>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
> >>>>>>>>      > [3] - https://jmeter.apache.org/index.html
> >>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>>>      > [5] - https://gatling.io/docs/current/
> >>>>>>>>      > [6] - https://github.com/gridgain/yardstick
> >>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
> >>>>>>>>      > [8] -
> >>>>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>>>>>>>      > [9] - https://github.com/gridgain/tiden
> >>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
> >>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
> >>>>>>>>      >
> >>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>>>>>>>      >> Hello,
> >>>>>>>>      >>
> >>>>>>>>      >> Branch with duck tape created -
> >>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>      >>
> >>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
> >>>>>>>>      >>
> >>>>>>>>      >>
> >>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>>>>>>>     <nizhikov....@gmail.com <mailto:nizhikov....@gmail.com>>
> >>>>> написал(а):
> >>>>>>>>      >>>
> >>>>>>>>      >>> Hello, Denis.
> >>>>>>>>      >>>
> >>>>>>>>      >>> There is no rush with these improvements.
> >>>>>>>>      >>> We can wait for Maxim proposal and compare two solutions
> :)
> >>>>>>>>      >>>
> >>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <dma...@apache.org
> >>>>>>>>     <mailto:dma...@apache.org>> написал(а):
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> Hi Nikolay,
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> Thanks for kicking off this conversation and sharing
> your
> >>>>>>>>     findings with the
> >>>>>>>>      >>>> results. That's the right initiative. I do agree that
> >> Ignite
> >>>>>>>>     needs to have
> >>>>>>>>      >>>> an integration testing framework with capabilities
> listed
> >> by
> >>>>> you.
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> As we discussed privately, I would only check if
> instead of
> >>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
> >>>>>>>>     testing framework
> >>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
> >>>>> clusters.
> >>>>>>>>     That
> >>>>>>>>      >>>> framework has been battle-tested and might be more
> >>>>> convenient for
> >>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim
> Shonichev
> >>>>>>>>      >>>> <mshonic...@gridgain.com <mailto:
> mshonic...@gridgain.com>>
> >>>>> who
> >>>>>>>>     promised to join this thread once he finishes
> >>>>>>>>      >>>> preparing the usage examples of the framework. To my
> >>>>>>>>     knowledge, Max has
> >>>>>>>>      >>>> already been working on that for several days.
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> -
> >>>>>>>>      >>>> Denis
> >>>>>>>>      >>>>
> >>>>>>>>      >>>>
> >>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>>>>>>>     <nizhi...@apache.org <mailto:nizhi...@apache.org>>
> >>>>>>>>      >>>> wrote:
> >>>>>>>>      >>>>
> >>>>>>>>      >>>>> Hello, Igniters.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> I created a PoC [1] for the integration tests of
> Ignite.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated
> testing
> >>>>> of
> >>>>>>>>     Ignite on
> >>>>>>>>      >>>>> «real cluster».
> >>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>>>>      >>>>>       * client and server nodes deployed on different
> >> hosts.
> >>>>>>>>      >>>>>       * thin clients perform queries from some other
> hosts
> >>>>>>>>      >>>>>       * etc.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks of
> >> some
> >>>>>>>>     internal
> >>>>>>>>      >>>>> Ignite process
> >>>>>>>>      >>>>>       * PME
> >>>>>>>>      >>>>>       * rebalance.
> >>>>>>>>      >>>>> This means we don’t know - Do we perform rebalance(or
> PME)
> >>>>> in
> >>>>>>>>     2.7.0 faster
> >>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
> >> Ignite
> >>>>>>>>     integration in
> >>>>>>>>      >>>>> a real-world environment:
> >>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
> >>>>>>>>      >>>>> I think some ML solutions also should be tested in
> >>>>> real-world
> >>>>>>>>     deployments.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Solution:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> I propose to use duck tape library from confluent
> (apache
> >>>>> 2.0
> >>>>>>>>     license)
> >>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud) and
> on
> >>>>> the
> >>>>>>>>     local
> >>>>>>>>      >>>>> environment(docker) and it works just fine.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> PoC contains following services:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>>       * Simple rebalance test:
> >>>>>>>>      >>>>>               Start 2 server nodes,
> >>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>      >>>>>               Start one more server node,
> >>>>>>>>      >>>>>               Wait for rebalance finish
> >>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
> >>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark
> worker,
> >>>>>>>>      >>>>>               Start 1 Ignite server node
> >>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>      >>>>>               Check data in application that queries it
> >> from
> >>>>>>>>     Spark.
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> All tests are fully automated.
> >>>>>>>>      >>>>> Logs collection works just fine.
> >>>>>>>>      >>>>> You can see an example of the tests report - [4].
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> Pros:
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> * Ability to test local changes(no need to public
> changes
> >> to
> >>>>>>>>     some remote
> >>>>>>>>      >>>>> repository or similar).
> >>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
> >> tests
> >>>>>>>>     on different
> >>>>>>>>      >>>>> JDK, JVM params, config, etc.)
> >>>>>>>>      >>>>> * Isolation by default so system tests are as reliable
> as
> >>>>>>>>     possible.
> >>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
> >> easily
> >>>>>>>>     in clusters in
> >>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
> >> Vagrant,
> >>>>>>>>     K8s, Mesos,
> >>>>>>>>      >>>>> Docker, cloud providers, etc.)
> >>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
> >>>>>>>>      >>>>> * Adopted and successfully used by other distributed
> open
> >>>>>>>>     source project -
> >>>>>>>>      >>>>> Apache Kafka.
> >>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
> >>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
> >> performance
> >>>>>>>>     results, etc.)
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> WDYT?
> >>>>>>>>      >>>>>
> >>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>>>>>      >>>>> [3]
> >>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> <2020-07-05--004.tar.gz>
> >>
> >>
> >>
> >
>
>

Re: [DISCUSSION] Ignite integration testing framework.

Reply via email to