Re: [DISCUSSION] Ignite integration testing framework.

Anton Vinogradov Tue, 21 Jul 2020 06:09:00 -0700

Discussed privately with Max.
Discussion results provided at the slack channel [1].


[1] https://the-asf.slack.com/archives/C016F4PS8KV/p1595336751234500

On Wed, Jul 15, 2020 at 3:59 PM Max Shonichev <mshon...@yandex.ru> wrote:

> Anton, Nikolay,
>
> I want to share some more findings about ducktests I've stubmled upon
> during porting them to Tiden.
>
>
> First problem was that GridGain Tiden-based tests by default use real
> production-like configuration for Ignite nodes, notably:
>
>   - persitence enabled
>   - ~120 caches in ~40 groups
>   - data set size around 1M keys per each cache
>   - primitive and PoJo cache values
>   - extensive use of query entities (indices)
>
> When I've tried to run 4 nodes with such configuration in docker, my
> notebook nearly burns. Nevertheless, grid was starting and working OK,
> but for one little 'but': each successive version under test was
> starting slower and slower.
>
> The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and your
> fork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes within
> default 120 seconds timeout. In order to mimick behavior of your tests I
> had to turn off persistence and use only 1 cache too.
>
> It's a pity that you completely ignore persistence and indices in your
> ducktests, otherwise you would quickly stuck into same limitation.
>
> I hope in the nearest time I would adopt Tiden docker PoC to our
> TeamCity and we'll try to git-bisect in order to find where this
> slowdown comes from. After that I'll file a bug to IGNITE Jira.
>
>
>
> Another problem with your rebalance benchmark is it's low accuracy due
> to granularity of measurements.
>
> You don't actually measure rebalance time, you measure time that takes
> you to find a specific string in logs, that's confusing.
>
> The scenario of your test is as follows:
>
> 1. start 3 server nodes
> 2. start 1 data loading client, preload a data, stop client
> 3. start 1 more server node
> 4. wait till server joins topology
> 5. wait till this server node completes exchange and write
> 'rebalanced=true, wasRebalanced=false' message to log
> 6. report time was taken by step 5 as 'Rebalance time'
>
> Confusing thing here is that 'wait till' implementation - you actually
> continuously re-scan logs sleeping each second and wait till message
> appear. So, that means that rebalance time is at least of second
> granularity or even higher, though it is reported with nanosecond
> precision.
>
> But for such lightweight configuration (single in-memory cache) and such
> small set of data (1M keys only), rebalancing is very fast, and usually
> performs under 1 second or just slightly slower.
>
> Before waiting for rebalance message you first wait for topology message
> and that wait also takes time to execute.
>
> So, at the time Python part of the test performs first scan of the logs,
> rebalancing is in most cases already done and time you report as
> '0.0760810375213623' is actually the time to execute logs scanning code.
>
> However, if rebalancing perform just a little bit slower after topology
> update, then first scan of logs is failed, you sleep for whole one
> second and rescan logs and there you got your message and report it as
> '1.02205491065979'.
>
> Under different conditions, dockerized application may run a little
> slower or a little faster, that depends on overall system load, free
> memory, etc. I've tried to increase load on my laptop by running browser
> or maven build, and time to scan logs may fluctuate from 0.02 to 0.09 or
> even 1.02 seconds. Note, that in CI environment, high system load from
> tenants is a quite ordinary situation.
>
> Suppose we adopted rebalance improvements and all versions after 2.9.0
> would perform within 1 second just as 2.9.0 itself. Then your benchmark
> would either report false negative (e.g. 0.02 for master and 0.03 for
> PR), while actually on next re-run it would pass (e.g. 0.07 for master
> and 0.03 for PR). That's not quite the 'stable and non-flacky' test
> Ignite community wants.
>
> What suggestions do you have to improve benchmark measurement accuracy?
>
>
> A third question is about PME free switch benchmark. Under some
> conditions, LongTxStreamerApplication actually hangs up PME. It need to
> be investigated further, but either this was due to persistence enabled
> or due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false
>
> Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option?
> Also, have you had performed a test of PME free switch with
> persistence-enabled caches?
>
>
> On 09.07.2020 10:11, Max Shonichev wrote:
> > Anton,
> >
> > well, strange thing, but clean up and rerun helped.
> >
> >
> > Ubuntu 18.04
> >
> >
> ====================================================================================================
>
> >
> > SESSION REPORT (ALL TESTS)
> > ducktape version: 0.7.7
> > session_id:       2020-07-06--003
> > run time:         4 minutes 44.835 seconds
> > tests run:        5
> > passed:           5
> > failed:           0
> > ignored:          0
> >
> ====================================================================================================
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >
> > status:     PASS
> > run time:   41.927 seconds
> > {"Rebalanced in (sec)": 1.02205491065979}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
>
> >
> > status:     PASS
> > run time:   51.985 seconds
> > {"Rebalanced in (sec)": 0.0760810375213623}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >
> > status:     PASS
> > run time:   1 minute 4.283 seconds
> > {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst
> > latency (ms)": "31035"}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 13.089 seconds
> > {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst
> > latency (ms)": "139"}
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
>
> >
> > status:     PASS
> > run time:   53.332 seconds
> >
> ----------------------------------------------------------------------------------------------------
>
> >
> >
> >
> > MacBook
> >
> ================================================================================
>
> >
> > SESSION REPORT (ALL TESTS)
> > ducktape version: 0.7.7
> > session_id:       2020-07-06--001
> > run time:         6 minutes 58.612 seconds
> > tests run:        5
> > passed:           5
> > failed:           0
> > ignored:          0
> >
> ================================================================================
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >
> > status:     PASS
> > run time:   48.724 seconds
> > {"Rebalanced in (sec)": 3.2574470043182373}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 23.210 seconds
> > {"Rebalanced in (sec)": 2.165921211242676}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >
> > status:     PASS
> > run time:   1 minute 12.659 seconds
> > {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency
> > (ms)": "31063"}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev
>
> >
> > status:     PASS
> > run time:   1 minute 57.257 seconds
> > {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst
> > latency (ms)": "1010"}
> >
> --------------------------------------------------------------------------------
>
> >
> > test_id:
> >
> ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client
>
> >
> > status:     PASS
> > run time:   1 minute 36.317 seconds
> >
> > =============
> >
> > while relative numbers proportion remains the same for different Ignite
> > versions, absolute number for mac/linux differ more then twice.
> >
> > I'm finalizing code with 'local Tiden' appliance for your tests.  PR
> > would be ready soon.
> >
> > Have you had a chance to deploy ducktests in bare metal?
> >
> >
> >
> > On 06.07.2020 14:27, Anton Vinogradov wrote:
> >> Max,
> >>
> >> Thanks for the check!
> >>
> >>> Is it OK for those tests to fail?
> >> No.
> >> I see really strange things at logs.
> >> Looks like you have concurrent ducktests run started not expected
> >> services,
> >> and this broke the tests.
> >> Could you please clean up the docker (use clean-up script [1]).
> >> Compile sources (use script [2]) and rerun the tests.
> >>
> >> [1]
> >>
> https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh
> >>
> >> [2]
> >>
> https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh
> >>
> >>
> >> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org>
> >> wrote:
> >>
> >>> Hello, Maxim.
> >>>
> >>> Thanks for writing down the minutes.
> >>>
> >>> There is no such thing as «Nikolay team» on the dev-list.
> >>> I propose to focus on product requirements and what we want to gain
> from
> >>> the framework instead of taking into account the needs of some team.
> >>>
> >>> Can you, please, write down your version of requirements so we can
> >>> reach a
> >>> consensus on that and therefore move to the discussion of the
> >>> implementation?
> >>>
> >>>> 6 июля 2020 г., в 11:18, Max Shonichev <mshon...@yandex.ru>
> написал(а):
> >>>>
> >>>> Yes, Denis,
> >>>>
> >>>> common ground seems to be as follows:
> >>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC
> >>> over physical hosts and share benchmark results. In the meantime,
> >>> while I
> >>> strongly believe that dockerized approach to benchmarking is a road to
> >>> misleading and false positives, I'll prepare a PoC of Tiden in
> >>> dockerized
> >>> environment to support 'fast development prototyping' usecase Nikolay
> >>> team
> >>> insist on. It should be a matter of few days.
> >>>>
> >>>> As a side note, I've run Anton PoC locally and would like to have some
> >>> comments about results:
> >>>>
> >>>> Test system: Ubuntu 18.04, docker 19.03.6
> >>>> Test commands:
> >>>>
> >>>>
> >>>> git clone -b ignite-ducktape g...@github.com:
> anton-vinogradov/ignite.git
> >>>> cd ignite
> >>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true
> >>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala
> >>>> cd modules/ducktests/tests/docker
> >>>> ./run_tests.sh
> >>>>
> >>>> Test results:
> >>>>
> >>>
> ====================================================================================================
>
> >>>
> >>>> SESSION REPORT (ALL TESTS)
> >>>> ducktape version: 0.7.7
> >>>> session_id:       2020-07-05--004
> >>>> run time:         7 minutes 36.360 seconds
> >>>> tests run:        5
> >>>> passed:           3
> >>>> failed:           2
> >>>> ignored:          0
> >>>>
> >>>
> ====================================================================================================
>
> >>>
> >>>> test_id:
> >>>
> ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1
>
> >>>
> >>>> status:     FAIL
> >>>> run time:   3 minutes 12.232 seconds
> >>>>
> >>>
> ----------------------------------------------------------------------------------------------------
>
> >>>
> >>>> test_id:
> >>>
> ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6
>
> >>>
> >>>> status:     FAIL
> >>>> run time:   1 minute 33.076 seconds
> >>>>
> >>>>
> >>>> Is it OK for those tests to fail? Attached is full test report
> >>>>
> >>>>
> >>>> On 02.07.2020 17:46, Denis Magda wrote:
> >>>>> Folks,
> >>>>> Please share the summary of that Slack conversation here for records
> >>> once
> >>>>> you find common ground.
> >>>>> -
> >>>>> Denis
> >>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <nizhi...@apache.org>
> >>> wrote:
> >>>>>> Igniters.
> >>>>>>
> >>>>>> All who are interested in integration testing framework discussion
> >>>>>> are
> >>>>>> welcome into slack channel -
> >>>>>>
> >>>
> https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2
> >>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <a...@apache.org>
> >>>>>>> написал(а):
> >>>>>>>
> >>>>>>> Max,
> >>>>>>> Thanks for joining us.
> >>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>>>> dependencies being deployed by external scripts.
> >>>>>>> No. It is important to distinguish development, deploy, and
> >>>>>> orchestration.
> >>>>>>> All-in-one solutions have extremely limited usability.
> >>>>>>> As to Ducktests:
> >>>>>>> Docker is responsible for deployments during development.
> >>>>>>> CI/CD is responsible for deployments during release and nightly
> >>> checks.
> >>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS.
> >>>>>>> Ducktape is responsible for orchestration.
> >>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion,
> >>>>>>>> while ducktape internally does all actions sequentially.
> >>>>>>> No. Ducktape may start any service in parallel. See Pme-free
> >>>>>>> benchmark
> >>>>>> [1] for details.
> >>>>>>>
> >>>>>>>> if we used ducktape solution we would have to instead prepare some
> >>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>> with
> >>>>>>>> Ansible or Chef.
> >>>>>>> Sure, because a way of deploy depends on infrastructure.
> >>>>>>> How can we be sure that OS we use and the restrictions we have
> >>>>>>> will be
> >>>>>> compatible with Tiden?
> >>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>> dependencies
> >>>>>>>> into one uber-image ...
> >>>>>>> and
> >>>>>>>> I guess we all know about docker hyped ability to run over
> >>> distributed
> >>>>>>>> virtual networks.
> >>>>>>> It is very important not to confuse the test's development (docker
> >>> image
> >>>>>> you're talking about) and real deployment.
> >>>>>>>
> >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does
> >>>>>>> All actions can be performed in parallel.
> >>>>>>> See how Ducktests [2] starts cluster in parallel for example.
> >>>>>>>
> >>>>>>> [1]
> >>>>>>
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84
> >>>
> >>>>>>> [2]
> >>>>>>
> >>>
> https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79
> >>>
> >>>>>>>
> >>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov <
> nizhi...@apache.org>
> >>>>>> wrote:
> >>>>>>> Hello, Maxim.
> >>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>> dependencies being deployed by external scripts
> >>>>>>>
> >>>>>>> Why do you think that maintaining deploy scripts coupled with the
> >>>>>> testing framework is an advantage?
> >>>>>>> I thought we want to see and maintain deployment scripts separate
> >>>>>>> from
> >>>>>> the testing framework.
> >>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>
> >>>>>>> Can you, please, clarify, what actions do you have in mind?
> >>>>>>> And why we want to execute them concurrently?
> >>>>>>> Ignite node start, Client application execution can be done
> >>> concurrently
> >>>>>> with the ducktape approach.
> >>>>>>>
> >>>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>>>>> with
> >>>>>> Ansible or Chef
> >>>>>>>
> >>>>>>> We shouldn’t take some user approach as an argument in this
> >>> discussion.
> >>>>>> Let’s discuss a general approach for all users of the Ignite.
> Anyway,
> >>> what
> >>>>>> is wrong with the external deployment script approach?
> >>>>>>>
> >>>>>>> We, as a community, should provide several ways to run integration
> >>> tests
> >>>>>> out-of-the-box AND the ability to customize deployment regarding the
> >>> user
> >>>>>> landscape.
> >>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>>>>> dependencies into one uber-image and that looks like simple and
> >>>>>> elegant
> >>>>>> solution however, that effectively limits you to single-host
> testing.
> >>>>>>>
> >>>>>>> Docker image should be used only by the Ignite developers to test
> >>>>>> something locally.
> >>>>>>> It’s not intended for some real-world testing.
> >>>>>>>
> >>>>>>> The main issue with the Tiden that I see, it tested and
> >>>>>>> maintained as
> >>> a
> >>>>>> closed source solution.
> >>>>>>> This can lead to the hard to solve problems when we start using and
> >>>>>> maintaining it as an open-source solution.
> >>>>>>> Like, how many developers used Tiden? And how many of developers
> >>>>>>> were
> >>>>>> not authors of the Tiden itself?
> >>>>>>>
> >>>>>>>
> >>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <mshon...@yandex.ru>
> >>>>>> написал(а):
> >>>>>>>>
> >>>>>>>> Anton, Nikolay,
> >>>>>>>>
> >>>>>>>> Let's agree on what we are arguing about: whether it is about
> "like
> >>> or
> >>>>>> don't like" or about technical properties of suggested solutions.
> >>>>>>>>
> >>>>>>>> If it is about likes and dislikes, then the whole discussion is
> >>>>>> meaningless. However, I hope together we can analyse pros and cons
> >>>>>> carefully.
> >>>>>>>>
> >>>>>>>> As far as I can understand now, two main differences between
> >>>>>>>> ducktape
> >>>>>> and tiden is that:
> >>>>>>>>
> >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on
> >>>>>> dependencies being deployed by external scripts.
> >>>>>>>>
> >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel
> >>>>>> fashion, while ducktape internally does all actions sequentially.
> >>>>>>>>
> >>>>>>>> As for me, these are very important properties for distributed
> >>> testing
> >>>>>> framework.
> >>>>>>>>
> >>>>>>>> First property let us easily reuse tiden in existing
> >>>>>>>> infrastructures,
> >>>>>> for example, during Zookeeper IEP testing at Sberbank site we used
> >>>>>> the
> >>> same
> >>>>>> tiden scripts that we use in our lab, the only change was putting a
> >>> list of
> >>>>>> hosts into config.
> >>>>>>>>
> >>>>>>>> If we used ducktape solution we would have to instead prepare some
> >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example,
> >>>>>> with
> >>>>>> Ansible or Chef.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You have solved this deficiency with docker by putting all
> >>>>>> dependencies into one uber-image and that looks like simple and
> >>>>>> elegant
> >>>>>> solution,
> >>>>>>>> however, that effectively limits you to single-host testing.
> >>>>>>>>
> >>>>>>>> I guess we all know about docker hyped ability to run over
> >>> distributed
> >>>>>> virtual networks. We used to go that way, but quickly found that
> >>>>>> it is
> >>> more
> >>>>>> of the hype than real work. In real environments, there are problems
> >>> with
> >>>>>> routing, DNS, multicast and broadcast traffic, and many others, that
> >>> turn
> >>>>>> docker-based distributed solution into a fragile hard-to-maintain
> >>> monster.
> >>>>>>>>
> >>>>>>>> Please, if you believe otherwise, perform a run of your PoC over
> at
> >>>>>> least two physical hosts and share results with us.
> >>>>>>>>
> >>>>>>>> If you consider that one physical docker host is enough, please,
> >>> don't
> >>>>>> overlook that we want to run real scale scenarios, with 50-100 cache
> >>>>>> groups, persistence enabled and a millions of keys loaded.
> >>>>>>>>
> >>>>>>>> Practical limit for such configurations is 4-6 nodes per single
> >>>>>> physical host. Otherwise, tests become flaky due to resource
> >>> starvation.
> >>>>>>>>
> >>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of
> >>>>>> your PoC with other tests running at TC (we're targeting TeamCity,
> >>> right?)
> >>>>>> and share results so we could check if the numbers are reproducible.
> >>>>>>>>
> >>>>>>>> I stress this once more: functional integration tests are OK to
> run
> >>> in
> >>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Second property let us write tests that require real-parallel
> >>>>>>>> actions
> >>>>>> over hosts.
> >>>>>>>>
> >>>>>>>> For example, agreed scenario for PME benchmarkduring "PME
> >>> optimization
> >>>>>> stream" was as follows:
> >>>>>>>>
> >>>>>>>>   - 10 server nodes, preloaded with 1M of keys
> >>>>>>>>   - 4 client nodes perform transactional load  (client nodes
> >>> physically
> >>>>>> separated from server nodes)
> >>>>>>>>   - during load:
> >>>>>>>>   -- 5 server nodes stopped in parallel
> >>>>>>>>   -- after 1 minute, all 5 nodes are started in parallel
> >>>>>>>>   - load stopped, logs are analysed for exchange times.
> >>>>>>>>
> >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape
> does,
> >>>>>> then partition map exchange merge would not happen and we could not
> >>> have
> >>>>>> measured PME optimizations for that case.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> These are limitations of ducktape that we believe as a more
> >>>>>>>> important
> >>>>>>>> argument "against" than you provide "for".
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote:
> >>>>>>>>> Folks,
> >>>>>>>>> First, I've created PR [1] with ducktests improvements
> >>>>>>>>> PR contains the following changes
> >>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master)
> >>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 &
> >>> 2.8)
> >>>>>>>>> - Global refactoring
> >>>>>>>>> -- benchmarks javacode simplification
> >>>>>>>>> -- services python and java classes code deduplication
> >>>>>>>>> -- fail-fast checks for java and python (eg. application should
> >>>>>> explicitly write it finished with success)
> >>>>>>>>> -- simple results extraction from tests and benchmarks
> >>>>>>>>> -- javacode now configurable from tests/benchmarks
> >>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last
> >>>>>> operation and log results)
> >>>>>>>>> -- docker volume now marked as delegated to increase execution
> >>>>>>>>> speed
> >>>>>> for mac & win users
> >>>>>>>>> -- Ignite cluster now start in parallel (start speed-up)
> >>>>>>>>> -- Ignite can be configured at test/benchmark
> >>>>>>>>> - full and module assembly scripts added
> >>>>>>>> Great job done! But let me remind one of Apache Ignite principles:
> >>>>>>>> week of thinking save months of development.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape
> >>>>>> integration) as a target "PoC check & real topology benchmarking
> >>>>>> tool".
> >>>>>>>>> Ducktape pros
> >>>>>>>>> - Developed for distributed system by distributed system
> >>>>>>>>> developers.
> >>>>>>>> So does Tiden
> >>>>>>>>
> >>>>>>>>> - Developed since 2014, stable.
> >>>>>>>> Tiden is also pretty stable, and development start date is not a
> >>>>>>>> good
> >>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for
> >>>>>> distributed testing) is since 2010, but we don't see it as a
> >>> alternative at
> >>>>>> all.
> >>>>>>>>
> >>>>>>>>> - Proven usability by usage at Kafka.
> >>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank
> >>>>>>>> deployments.
> >>>>>>>> Core, storage, sql and tx teams use benchmark results provided by
> >>>>>> Tiden on a daily basis.
> >>>>>>>>
> >>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great
> >>>>>>>>> example
> >>>>>> pack.
> >>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in
> >>>>>> previous letter.
> >>>>>>>>
> >>>>>>>>> - Built-in Docker support for rapid development and checks.
> >>>>>>>> False, there's no specific 'docker support' in ducktape itself,
> you
> >>>>>> just wrap it in docker by yourself, because ducktape is lacking
> >>> deployment
> >>>>>> abilities.
> >>>>>>>>
> >>>>>>>>> - Great for CI automation.
> >>>>>>>> False, there's no specific CI-enabled features in ducktape.
> >>>>>>>> Tiden, on
> >>>>>> the other hand, provide generic xUnit reporting format, which is
> >>> supported
> >>>>>> by both TeamCity and Jenkins. Also, instead of using private keys,
> >>> Tiden
> >>>>>> can use SSH agent, which is also great for CI, because both
> >>>>>>>> TeamCity and Jenkins store keys in secret storage available only
> >>>>>>>> for
> >>>>>> ssh-agent and only for the time of the test.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> As an additional motivation, at least 3 teams
> >>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and
> >>> Zabbix
> >>>>>> speed-up))
> >>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not
> >>>>>> slow-down or broke AI features)
> >>>>>>>>> - Ignite SE QA team (to append already developed
> >>>>>>>>> smoke/load/failover
> >>>>>> tests to AI codebase)
> >>>>>>>>
> >>>>>>>> Please, before recommending your tests to other teams, provide
> >>>>>>>> proofs
> >>>>>>>> that your tests are reproducible in real environment.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> now, wait for ducktest merge to start checking cases they
> >>>>>>>>> working on
> >>>>>> in AI way.
> >>>>>>>>> Thoughts?
> >>>>>>>> Let us together review both solutions, we'll try to run your
> >>>>>>>> tests in
> >>>>>> our lab, and you'll try to at least checkout tiden and see if same
> >>> tests
> >>>>>> can be implemented with it?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> [1] https://github.com/apache/ignite/pull/7967
> >>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov <
> >>> nizhi...@apache.org
> >>>>>> <mailto:nizhi...@apache.org>> wrote:
> >>>>>>>>>     Hello, Maxim.
> >>>>>>>>>     Thank you for so detailed explanation.
> >>>>>>>>>     Can we put the content of this discussion somewhere on the
> >>>>>>>>> wiki?
> >>>>>>>>>     So It doesn’t get lost.
> >>>>>>>>>     I divide the answer in several parts. From the requirements
> to
> >>> the
> >>>>>>>>>     implementation.
> >>>>>>>>>     So, if we agreed on the requirements we can proceed with the
> >>>>>>>>>     discussion of the implementation.
> >>>>>>>>>     1. Requirements:
> >>>>>>>>>     The main goal I want to achieve is *reproducibility* of the
> >>> tests.
> >>>>>>>>>     I’m sick and tired with the zillions of flaky, rarely
> >>>>>>>>> failed, and
> >>>>>>>>>     almost never failed tests in Ignite codebase.
> >>>>>>>>>     We should start with the simplest scenarios that will be as
> >>>>>> reliable
> >>>>>>>>>     as steel :)
> >>>>>>>>>     I want to know for sure:
> >>>>>>>>>        - Is this PR makes rebalance quicker or not?
> >>>>>>>>>        - Is this PR makes PME quicker or not?
> >>>>>>>>>     So, your description of the complex test scenario looks as
> >>>>>>>>> a next
> >>>>>>>>>     step to me.
> >>>>>>>>>     Anyway, It’s cool we already have one.
> >>>>>>>>>     The second goal is to have a strict test lifecycle as we
> >>>>>>>>> have in
> >>>>>>>>>     JUnit and similar frameworks.
> >>>>>>>>>      > It covers production-like deployment and running a
> >>>>>>>>> scenarios
> >>>>>> over
> >>>>>>>>>     a single database instance.
> >>>>>>>>>     Do you mean «single cluster» or «single host»?
> >>>>>>>>>     2. Existing tests:
> >>>>>>>>>      > A Combinator suite allows to run set of operations
> >>> concurrently
> >>>>>>>>>     over given database instance.
> >>>>>>>>>      > A Consumption suite allows to run a set production-like
> >>> actions
> >>>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >>> metrics
> >>>>>>>>>     across versions
> >>>>>>>>>      > A Yardstick suite
> >>>>>>>>>      > A Stress suite that simulates hardware environment
> >>>>>>>>> degradation
> >>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>>> functional
> >>>>>>>>>     regression testing
> >>>>>>>>>      > Regression
> >>>>>>>>>     Great news that we already have so many choices for testing!
> >>>>>>>>>     Mature test base is a big +1 for Tiden.
> >>>>>>>>>     3. Comparison:
> >>>>>>>>>      > Criteria: Test configuration
> >>>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>>      > Tiden: any number of YaML config files, command line
> option
> >>> for
> >>>>>>>>>     fine-grained test configuration, ability to select/modify
> >>>>>>>>> tests
> >>>>>>>>>     behavior based on Ignite version.
> >>>>>>>>>     1. Many YAML files can be hard to maintain.
> >>>>>>>>>     2. In ducktape, you can set parameters via «—parameters»
> >>>>>>>>> option.
> >>>>>>>>>     Please, take a look at the doc [1]
> >>>>>>>>>      > Criteria: Cluster control
> >>>>>>>>>      > Tiden: additionally can address cluster as a whole and
> >>>>>>>>> execute
> >>>>>>>>>     remote commands in parallel.
> >>>>>>>>>     It seems we implement this ability in the PoC, already.
> >>>>>>>>>      > Criteria: Test assertions
> >>>>>>>>>      > Tiden: simple asserts, also few customized assertion
> >>>>>>>>> helpers.
> >>>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>>     Can you, please, be more specific.
> >>>>>>>>>     What helpers do you have in mind?
> >>>>>>>>>     Ducktape has an asserts that waits for logfile messages or
> >>>>>>>>> some
> >>>>>>>>>     process finish.
> >>>>>>>>>      > Criteria: Test reporting
> >>>>>>>>>      > Ducktape: limited to its own text/HTML format
> >>>>>>>>>     Ducktape have
> >>>>>>>>>     1. Text reporter
> >>>>>>>>>     2. Customizable HTML reporter
> >>>>>>>>>     3. JSON reporter.
> >>>>>>>>>     We can show JSON with the any template or tool.
> >>>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> >>>>>>>>> test
> >>>>>>>>>     needs. However, that means, that test can’t be scaled without
> >>> test
> >>>>>>>>>     code changes. Does not do any deploy, relies on external
> >>>>>>>>> means,
> >>>>>> e.g.
> >>>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>>     This is not true.
> >>>>>>>>>     1. We can set explicit test parameters(node number) via
> >>> parameters.
> >>>>>>>>>     We can increase client count of cluster size without test
> code
> >>>>>> changes.
> >>>>>>>>>     2. We have many choices for the test environment. These
> >>>>>>>>> choices
> >>> are
> >>>>>>>>>     tested and used in other projects:
> >>>>>>>>>              * docker
> >>>>>>>>>              * vagrant
> >>>>>>>>>              * private cloud(ssh access)
> >>>>>>>>>              * ec2
> >>>>>>>>>     Please, take a look at Kafka documentation [2]
> >>>>>>>>>      > I can continue more on this, but it should be enough for
> >>>>>>>>> now:
> >>>>>>>>>     We need to go deeper! :)
> >>>>>>>>>     [1]
> >>>>>>>>>
> >>>>>>
> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
> >>>>>>>>>     [2]
> >>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart
> >>>>>>>>>      > 9 июня 2020 г., в 17:25, Max A. Shonichev
> >>>>>>>>> <mshon...@yandex.ru
> >>>>>>>>>     <mailto:mshon...@yandex.ru>> написал(а):
> >>>>>>>>>      >
> >>>>>>>>>      > Greetings, Nikolay,
> >>>>>>>>>      >
> >>>>>>>>>      > First of all, thank you for you great effort preparing
> >>>>>>>>> PoC of
> >>>>>>>>>     integration testing to Ignite community.
> >>>>>>>>>      >
> >>>>>>>>>      > It’s a shame Ignite did not have at least some such
> >>>>>>>>> tests yet,
> >>>>>>>>>     however, GridGain, as a major contributor to Apache Ignite
> >>>>>>>>> had a
> >>>>>>>>>     profound collection of in-house tools to perform
> >>>>>>>>> integration and
> >>>>>>>>>     performance testing for years already and while we slowly
> >>> consider
> >>>>>>>>>     sharing our expertise with the community, your initiative
> >>>>>>>>> makes
> >>> us
> >>>>>>>>>     drive that process a bit faster, thanks a lot!
> >>>>>>>>>      >
> >>>>>>>>>      > I reviewed your PoC and want to share a little about
> >>>>>>>>> what we
> >>> do
> >>>>>>>>>     on our part, why and how, hope it would help community take
> >>> proper
> >>>>>>>>>     course.
> >>>>>>>>>      >
> >>>>>>>>>      > First I’ll do a brief overview of what decisions we made
> >>>>>>>>> and
> >>>>>> what
> >>>>>>>>>     we do have in our private code base, next I’ll describe
> >>>>>>>>> what we
> >>>>>> have
> >>>>>>>>>     already donated to the public and what we plan public next,
> >>>>>>>>> then
> >>>>>>>>>     I’ll compare both approaches highlighting deficiencies in
> >>>>>>>>> order
> >>> to
> >>>>>>>>>     spur public discussion on the matter.
> >>>>>>>>>      >
> >>>>>>>>>      > It might seem strange to use Python to run Bash to run
> Java
> >>>>>>>>>     applications because that introduces IT industry best of
> >>>>>>>>> breed’ –
> >>>>>>>>>     the Python dependency hell – to the Java application code
> >>>>>>>>> base.
> >>> The
> >>>>>>>>>     only strangest decision one can made is to use Maven to run
> >>> Docker
> >>>>>>>>>     to run Bash to run Python to run Bash to run Java, but
> >>>>>>>>> desperate
> >>>>>>>>>     times call for desperate measures I guess.
> >>>>>>>>>      >
> >>>>>>>>>      > There are Java-based solutions for integration testing
> >>>>>>>>> exists,
> >>>>>>>>>     e.g. Testcontainers [1], Arquillian [2], etc, and they
> >>>>>>>>> might go
> >>>>>> well
> >>>>>>>>>     for Ignite community CI pipelines by them selves. But we also
> >>>>>> wanted
> >>>>>>>>>     to run performance tests and benchmarks, like the dreaded PME
> >>>>>>>>>     benchmark, and this is solved by totally different set of
> >>>>>>>>> tools
> >>> in
> >>>>>>>>>     Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc.
> >>>>>>>>>      >
> >>>>>>>>>      > Speaking specifically about benchmarking, Apache Ignite
> >>>>>> community
> >>>>>>>>>     already has Yardstick [6], and there’s nothing wrong with
> >>>>>>>>> writing
> >>>>>>>>>     PME benchmark using Yardstick, but we also wanted to be
> >>>>>>>>> able to
> >>> run
> >>>>>>>>>     scenarios like this:
> >>>>>>>>>      > - put an X load to a Ignite database;
> >>>>>>>>>      > - perform an Y set of operations to check how Ignite copes
> >>> with
> >>>>>>>>>     operations under load.
> >>>>>>>>>      >
> >>>>>>>>>      > And yes, we also wanted applications under test be
> deployed
> >>>>>> ‘like
> >>>>>>>>>     in a production’, e.g. distributed over a set of hosts. This
> >>> arises
> >>>>>>>>>     questions about provisioning and nodes affinity which I’ll
> >>>>>>>>> cover
> >>> in
> >>>>>>>>>     detail later.
> >>>>>>>>>      >
> >>>>>>>>>      > So we decided to put a little effort to build a simple
> >>>>>>>>> tool to
> >>>>>>>>>     cover different integration and performance scenarios, and
> >>>>>>>>> our QA
> >>>>>>>>>     lab first attempt was PoC-Tester [7], currently open source
> >>>>>>>>> for
> >>> all
> >>>>>>>>>     but for reporting web UI. It’s a quite simple to use 95%
> >>> Java-based
> >>>>>>>>>     tool targeted to be run on a pre-release QA stage.
> >>>>>>>>>      >
> >>>>>>>>>      > It covers production-like deployment and running a
> >>>>>>>>> scenarios
> >>>>>> over
> >>>>>>>>>     a single database instance. PoC-Tester scenarios consists of
> a
> >>>>>>>>>     sequence of tasks running sequentially or in parallel.
> >>>>>>>>> After all
> >>>>>>>>>     tasks complete, or at any time during test, user can run logs
> >>>>>>>>>     collection task, logs are checked against exceptions and a
> >>> summary
> >>>>>>>>>     of found issues and task ops/latency statistics is
> >>>>>>>>> generated at
> >>> the
> >>>>>>>>>     end of scenario. One of the main PoC-Tester features is its
> >>>>>>>>>     fire-and-forget approach to task managing. That is, you can
> >>> deploy
> >>>>>> a
> >>>>>>>>>     grid and left it running for weeks, periodically firing some
> >>> tasks
> >>>>>>>>>     onto it.
> >>>>>>>>>      >
> >>>>>>>>>      > During earliest stages of PoC-Tester development it
> becomes
> >>>>>> quite
> >>>>>>>>>     clear that Java application development is a tedious
> >>>>>>>>> process and
> >>>>>>>>>     architecture decisions you take during development are slow
> >>>>>>>>> and
> >>>>>> hard
> >>>>>>>>>     to change.
> >>>>>>>>>      > For example, scenarios like this
> >>>>>>>>>      > - deploy two instances of GridGain with master-slave data
> >>>>>>>>>     replication configured;
> >>>>>>>>>      > - put a load on master;
> >>>>>>>>>      > - perform checks on slave,
> >>>>>>>>>      > or like this:
> >>>>>>>>>      > - preload a 1Tb of data by using your favorite tool of
> >>>>>>>>> choice
> >>> to
> >>>>>>>>>     an Apache Ignite of version X;
> >>>>>>>>>      > - run a set of functional tests running Apache Ignite
> >>>>>>>>> version
> >>> Y
> >>>>>>>>>     over preloaded data,
> >>>>>>>>>      > do not fit well in the PoC-Tester workflow.
> >>>>>>>>>      >
> >>>>>>>>>      > So, this is why we decided to use Python as a generic
> >>> scripting
> >>>>>>>>>     language of choice.
> >>>>>>>>>      >
> >>>>>>>>>      > Pros:
> >>>>>>>>>      > - quicker prototyping and development cycles
> >>>>>>>>>      > - easier to find DevOps/QA engineer with Python skills
> than
> >>> one
> >>>>>>>>>     with Java skills
> >>>>>>>>>      > - used extensively all over the world for DevOps/CI
> >>>>>>>>> pipelines
> >>>>>> and
> >>>>>>>>>     thus has rich set of libraries for all possible integration
> >>>>>>>>> uses
> >>>>>> cases.
> >>>>>>>>>      >
> >>>>>>>>>      > Cons:
> >>>>>>>>>      > - Nightmare with dependencies. Better stick to specific
> >>>>>>>>>     language/libraries version.
> >>>>>>>>>      >
> >>>>>>>>>      > Comparing alternatives for Python-based testing
> >>>>>>>>> framework we
> >>>>>> have
> >>>>>>>>>     considered following requirements, somewhat similar to what
> >>> you’ve
> >>>>>>>>>     mentioned for Confluent [8] previously:
> >>>>>>>>>      > - should be able run locally or distributed (bare metal
> >>>>>>>>> or in
> >>>>>> the
> >>>>>>>>>     cloud)
> >>>>>>>>>      > - should have built-in deployment facilities for
> >>>>>>>>> applications
> >>>>>>>>>     under test
> >>>>>>>>>      > - should separate test configuration and test code
> >>>>>>>>>      > -- be able to easily reconfigure tests by simple
> >>>>>>>>> configuration
> >>>>>>>>>     changes
> >>>>>>>>>      > -- be able to easily scale test environment by simple
> >>>>>>>>>     configuration changes
> >>>>>>>>>      > -- be able to perform regression testing by simple
> >>>>>>>>> switching
> >>>>>>>>>     artifacts under test via configuration
> >>>>>>>>>      > -- be able to run tests with different JDK version by
> >>>>>>>>> simple
> >>>>>>>>>     configuration changes
> >>>>>>>>>      > - should have human readable reports and/or reporting
> tools
> >>>>>>>>>     integration
> >>>>>>>>>      > - should allow simple test progress monitoring, one does
> >>>>>>>>> not
> >>>>>> want
> >>>>>>>>>     to run 6-hours test to find out that application actually
> >>>>>>>>> crashed
> >>>>>>>>>     during first hour.
> >>>>>>>>>      > - should allow parallel execution of test actions
> >>>>>>>>>      > - should have clean API for test writers
> >>>>>>>>>      > -- clean API for distributed remote commands execution
> >>>>>>>>>      > -- clean API for deployed applications start / stop and
> >>>>>>>>> other
> >>>>>>>>>     operations
> >>>>>>>>>      > -- clean API for performing check on results
> >>>>>>>>>      > - should be open source or at least source code should
> >>>>>>>>> allow
> >>>>>> ease
> >>>>>>>>>     change or extension
> >>>>>>>>>      >
> >>>>>>>>>      > Back at that time we found no better alternative than to
> >>>>>>>>> write
> >>>>>>>>>     our own framework, and here goes Tiden [9] as GridGain
> >>>>>>>>> framework
> >>> of
> >>>>>>>>>     choice for functional integration and performance testing.
> >>>>>>>>>      >
> >>>>>>>>>      > Pros:
> >>>>>>>>>      > - solves all the requirements above
> >>>>>>>>>      > Cons (for Ignite):
> >>>>>>>>>      > - (currently) closed GridGain source
> >>>>>>>>>      >
> >>>>>>>>>      > On top of Tiden we’ve built a set of test suites, some of
> >>> which
> >>>>>>>>>     you might have heard already.
> >>>>>>>>>      >
> >>>>>>>>>      > A Combinator suite allows to run set of operations
> >>> concurrently
> >>>>>>>>>     over given database instance. Proven to find at least 30+
> race
> >>>>>>>>>     conditions and NPE issues.
> >>>>>>>>>      >
> >>>>>>>>>      > A Consumption suite allows to run a set production-like
> >>> actions
> >>>>>>>>>     over given set of Ignite/GridGain versions and compare test
> >>> metrics
> >>>>>>>>>     across versions, like heap/disk/CPU consumption, time to
> >>>>>>>>> perform
> >>>>>>>>>     actions, like client PME, server PME, rebalancing time, data
> >>>>>>>>>     replication time, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > A Yardstick suite is a thin layer of Python glue code to
> >>>>>>>>> run
> >>>>>>>>>     Apache Ignite pre-release benchmarks set. Yardstick itself
> >>>>>>>>> has a
> >>>>>>>>>     mediocre deployment capabilities, Tiden solves this easily.
> >>>>>>>>>      >
> >>>>>>>>>      > A Stress suite that simulates hardware environment
> >>>>>>>>> degradation
> >>>>>>>>>     during testing.
> >>>>>>>>>      >
> >>>>>>>>>      > An Ultimate, DR and Compatibility suites that performs
> >>>>>> functional
> >>>>>>>>>     regression testing of GridGain Ultimate Edition features like
> >>>>>>>>>     snapshots, security, data replication, rolling upgrades, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > A Regression and some IEPs testing suites, like IEP-14,
> >>> IEP-15,
> >>>>>>>>>     etc, etc, etc.
> >>>>>>>>>      >
> >>>>>>>>>      > Most of the suites above use another in-house developed
> >>>>>>>>> Java
> >>>>>> tool
> >>>>>>>>>     – PiClient – to perform actual loading and miscellaneous
> >>> operations
> >>>>>>>>>     with Ignite under test. We use py4j Python-Java gateway
> >>>>>>>>> library
> >>> to
> >>>>>>>>>     control PiClient instances from the tests.
> >>>>>>>>>      >
> >>>>>>>>>      > When we considered CI, we put TeamCity out of scope,
> >>>>>>>>> because
> >>>>>>>>>     distributed integration and performance tests tend to run for
> >>> hours
> >>>>>>>>>     and TeamCity agents are scarce and costly resource. So,
> >>>>>>>>> bundled
> >>>>>> with
> >>>>>>>>>     Tiden there is jenkins-job-builder [10] based CI pipelines
> and
> >>>>>>>>>     Jenkins xUnit reporting. Also, rich web UI tool Ward
> >>>>>>>>> aggregates
> >>>>>> test
> >>>>>>>>>     run reports across versions and has built in visualization
> >>> support
> >>>>>>>>>     for Combinator suite.
> >>>>>>>>>      >
> >>>>>>>>>      > All of the above is currently closed source, but we plan
> to
> >>> make
> >>>>>>>>>     it public for community, and publishing Tiden core [9] is the
> >>> first
> >>>>>>>>>     step on that way. You can review some examples of using
> >>>>>>>>> Tiden for
> >>>>>>>>>     tests at my repository [11], for start.
> >>>>>>>>>      >
> >>>>>>>>>      > Now, let’s compare Ducktape PoC and Tiden.
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Language
> >>>>>>>>>      > Tiden: Python, 3.7
> >>>>>>>>>      > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7
> >>>>>>>>>     compatible, but actually can’t work with Python 3.7 due to
> >>>>>>>>> broken
> >>>>>>>>>     Zmq dependency.
> >>>>>>>>>      > Comment: Python 3.7 has a much better support for
> >>>>>>>>> async-style
> >>>>>>>>>     code which might be crucial for distributed application
> >>>>>>>>> testing.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 0
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test writers API
> >>>>>>>>>      > Supported integration test framework concepts are
> basically
> >>> the
> >>>>>> same:
> >>>>>>>>>      > - a test controller (test runner)
> >>>>>>>>>      > - a cluster
> >>>>>>>>>      > - a node
> >>>>>>>>>      > - an application (a service in Ducktape terms)
> >>>>>>>>>      > - a test
> >>>>>>>>>      > Score: Tiden: 5, Ducktape: 5
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Tests selection and run
> >>>>>>>>>      > Ducktape: suite-package-class-method level selection,
> >>>>>>>>> internal
> >>>>>>>>>     scheduler allows to run tests in suite in parallel.
> >>>>>>>>>      > Tiden: also suite-package-class-method level selection,
> >>>>>>>>>     additionally allows selecting subset of tests by attribute,
> >>>>>> parallel
> >>>>>>>>>     runs not built in, but allows merging test reports after
> >>> different
> >>>>>> runs.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 2
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test configuration
> >>>>>>>>>      > Ducktape: single JSON string for all tests
> >>>>>>>>>      > Tiden: any number of YaML config files, command line
> option
> >>> for
> >>>>>>>>>     fine-grained test configuration, ability to select/modify
> >>>>>>>>> tests
> >>>>>>>>>     behavior based on Ignite version.
> >>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Cluster control
> >>>>>>>>>      > Ducktape: allow execute remote commands by node
> granularity
> >>>>>>>>>      > Tiden: additionally can address cluster as a whole and
> >>>>>>>>> execute
> >>>>>>>>>     remote commands in parallel.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Logs control
> >>>>>>>>>      > Both frameworks have similar builtin support for remote
> >>>>>>>>> logs
> >>>>>>>>>     collection and grepping. Tiden has built-in plugin that can
> >>>>>>>>> zip,
> >>>>>>>>>     collect arbitrary log files from arbitrary locations at
> >>>>>>>>>     test/module/suite granularity and unzip if needed, also
> >>> application
> >>>>>>>>>     API to search / wait for messages in logs. Ducktape allows
> >>>>>>>>> each
> >>>>>>>>>     service declare its log files location (seemingly does not
> >>> support
> >>>>>>>>>     logs rollback), and a single entrypoint to collect service
> >>>>>>>>> logs.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test assertions
> >>>>>>>>>      > Tiden: simple asserts, also few customized assertion
> >>>>>>>>> helpers.
> >>>>>>>>>      > Ducktape: simple asserts.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Test reporting
> >>>>>>>>>      > Ducktape: limited to its own text/html format
> >>>>>>>>>      > Tiden: provides text report, yaml report for reporting
> >>>>>>>>> tools
> >>>>>>>>>     integration, XML xUnit report for integration with
> >>>>>> Jenkins/TeamCity.
> >>>>>>>>>      > Score: Tiden: 3, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Provisioning and deployment
> >>>>>>>>>      > Ducktape: can provision subset of hosts from cluster for
> >>>>>>>>> test
> >>>>>>>>>     needs. However, that means, that test can’t be scaled without
> >>> test
> >>>>>>>>>     code changes. Does not do any deploy, relies on external
> >>>>>>>>> means,
> >>>>>> e.g.
> >>>>>>>>>     pre-packaged in docker image, as in PoC.
> >>>>>>>>>      > Tiden: Given a set of hosts, Tiden uses all of them for
> the
> >>>>>> test.
> >>>>>>>>>     Provisioning should be done by external means. However,
> >>>>>>>>> provides
> >>> a
> >>>>>>>>>     conventional automated deployment routines.
> >>>>>>>>>      > Score: Tiden: 1, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > Criteria: Documentation and Extensibility
> >>>>>>>>>      > Tiden: current API documentation is limited, should
> >>>>>>>>> change as
> >>> we
> >>>>>>>>>     go open source. Tiden is easily extensible via hooks and
> >>>>>>>>> plugins,
> >>>>>>>>>     see example Maven plugin and Gatling application at [11].
> >>>>>>>>>      > Ducktape: basic documentation at readthedocs.io
> >>>>>>>>>     <http://readthedocs.io>. Codebase is rigid, framework core
> is
> >>>>>>>>>     tightly coupled and hard to change. The only possible
> >>>>>>>>> extension
> >>>>>>>>>     mechanism is fork-and-rewrite.
> >>>>>>>>>      > Score: Tiden: 2, Ducktape: 1
> >>>>>>>>>      >
> >>>>>>>>>      > I can continue more on this, but it should be enough for
> >>>>>>>>> now:
> >>>>>>>>>      > Overall score: Tiden: 22, Ducktape: 14.
> >>>>>>>>>      >
> >>>>>>>>>      > Time for discussion!
> >>>>>>>>>      >
> >>>>>>>>>      > ---
> >>>>>>>>>      > [1] - https://www.testcontainers.org/
> >>>>>>>>>      > [2] - http://arquillian.org/guides/getting_started/
> >>>>>>>>>      > [3] - https://jmeter.apache.org/index.html
> >>>>>>>>>      > [4] - https://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>>>>      > [5] - https://gatling.io/docs/current/
> >>>>>>>>>      > [6] - https://github.com/gridgain/yardstick
> >>>>>>>>>      > [7] - https://github.com/gridgain/poc-tester
> >>>>>>>>>      > [8] -
> >>>>>>>>>
> >>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> >>>
> >>>>>>>>>      > [9] - https://github.com/gridgain/tiden
> >>>>>>>>>      > [10] - https://pypi.org/project/jenkins-job-builder/
> >>>>>>>>>      > [11] - https://github.com/mshonichev/tiden_examples
> >>>>>>>>>      >
> >>>>>>>>>      > On 25.05.2020 11:09, Nikolay Izhikov wrote:
> >>>>>>>>>      >> Hello,
> >>>>>>>>>      >>
> >>>>>>>>>      >> Branch with duck tape created -
> >>>>>>>>>     https://github.com/apache/ignite/tree/ignite-ducktape
> >>>>>>>>>      >>
> >>>>>>>>>      >> Any who are willing to contribute to PoC are welcome.
> >>>>>>>>>      >>
> >>>>>>>>>      >>
> >>>>>>>>>      >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov
> >>>>>>>>>     <nizhikov....@gmail.com <mailto:nizhikov....@gmail.com>>
> >>>>>> написал(а):
> >>>>>>>>>      >>>
> >>>>>>>>>      >>> Hello, Denis.
> >>>>>>>>>      >>>
> >>>>>>>>>      >>> There is no rush with these improvements.
> >>>>>>>>>      >>> We can wait for Maxim proposal and compare two
> >>>>>>>>> solutions :)
> >>>>>>>>>      >>>
> >>>>>>>>>      >>>> 21 мая 2020 г., в 22:24, Denis Magda <
> dma...@apache.org
> >>>>>>>>>     <mailto:dma...@apache.org>> написал(а):
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> Hi Nikolay,
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> Thanks for kicking off this conversation and sharing
> >>>>>>>>> your
> >>>>>>>>>     findings with the
> >>>>>>>>>      >>>> results. That's the right initiative. I do agree that
> >>> Ignite
> >>>>>>>>>     needs to have
> >>>>>>>>>      >>>> an integration testing framework with capabilities
> >>>>>>>>> listed
> >>> by
> >>>>>> you.
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> As we discussed privately, I would only check if
> >>>>>>>>> instead of
> >>>>>>>>>      >>>> Confluent's Ducktape library, we can use an integration
> >>>>>>>>>     testing framework
> >>>>>>>>>      >>>> developed by GridGain for testing of Ignite/GridGain
> >>>>>> clusters.
> >>>>>>>>>     That
> >>>>>>>>>      >>>> framework has been battle-tested and might be more
> >>>>>> convenient for
> >>>>>>>>>      >>>> Ignite-specific workloads. Let's wait for @Maksim
> >>>>>>>>> Shonichev
> >>>>>>>>>      >>>> <mshonic...@gridgain.com
> >>>>>>>>> <mailto:mshonic...@gridgain.com>>
> >>>>>> who
> >>>>>>>>>     promised to join this thread once he finishes
> >>>>>>>>>      >>>> preparing the usage examples of the framework. To my
> >>>>>>>>>     knowledge, Max has
> >>>>>>>>>      >>>> already been working on that for several days.
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> -
> >>>>>>>>>      >>>> Denis
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov
> >>>>>>>>>     <nizhi...@apache.org <mailto:nizhi...@apache.org>>
> >>>>>>>>>      >>>> wrote:
> >>>>>>>>>      >>>>
> >>>>>>>>>      >>>>> Hello, Igniters.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> I created a PoC [1] for the integration tests of
> >>>>>>>>> Ignite.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Let me briefly explain the gap I want to cover:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 1. For now, we don’t have a solution for automated
> >>>>>>>>> testing
> >>>>>> of
> >>>>>>>>>     Ignite on
> >>>>>>>>>      >>>>> «real cluster».
> >>>>>>>>>      >>>>> By «real cluster» I mean cluster «like a production»:
> >>>>>>>>>      >>>>>       * client and server nodes deployed on different
> >>> hosts.
> >>>>>>>>>      >>>>>       * thin clients perform queries from some other
> >>>>>>>>> hosts
> >>>>>>>>>      >>>>>       * etc.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 2. We don’t have a solution for automated benchmarks
> of
> >>> some
> >>>>>>>>>     internal
> >>>>>>>>>      >>>>> Ignite process
> >>>>>>>>>      >>>>>       * PME
> >>>>>>>>>      >>>>>       * rebalance.
> >>>>>>>>>      >>>>> This means we don’t know - Do we perform
> >>>>>>>>> rebalance(or PME)
> >>>>>> in
> >>>>>>>>>     2.7.0 faster
> >>>>>>>>>      >>>>> or slower than in 2.8.0 for the same cluster?
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> 3. We don’t have a solution for automated testing of
> >>> Ignite
> >>>>>>>>>     integration in
> >>>>>>>>>      >>>>> a real-world environment:
> >>>>>>>>>      >>>>> Ignite-Spark integration can be taken as an example.
> >>>>>>>>>      >>>>> I think some ML solutions also should be tested in
> >>>>>> real-world
> >>>>>>>>>     deployments.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Solution:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> I propose to use duck tape library from confluent
> >>>>>>>>> (apache
> >>>>>> 2.0
> >>>>>>>>>     license)
> >>>>>>>>>      >>>>> I tested it both on the real cluster(Yandex Cloud)
> >>>>>>>>> and on
> >>>>>> the
> >>>>>>>>>     local
> >>>>>>>>>      >>>>> environment(docker) and it works just fine.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> PoC contains following services:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>>       * Simple rebalance test:
> >>>>>>>>>      >>>>>               Start 2 server nodes,
> >>>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>>      >>>>>               Start one more server node,
> >>>>>>>>>      >>>>>               Wait for rebalance finish
> >>>>>>>>>      >>>>>       * Simple Ignite-Spark integration test:
> >>>>>>>>>      >>>>>               Start 1 Spark master, start 1 Spark
> >>>>>>>>> worker,
> >>>>>>>>>      >>>>>               Start 1 Ignite server node
> >>>>>>>>>      >>>>>               Create some data with Ignite client,
> >>>>>>>>>      >>>>>               Check data in application that queries
> it
> >>> from
> >>>>>>>>>     Spark.
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> All tests are fully automated.
> >>>>>>>>>      >>>>> Logs collection works just fine.
> >>>>>>>>>      >>>>> You can see an example of the tests report - [4].
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> Pros:
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> * Ability to test local changes(no need to public
> >>>>>>>>> changes
> >>> to
> >>>>>>>>>     some remote
> >>>>>>>>>      >>>>> repository or similar).
> >>>>>>>>>      >>>>> * Ability to parametrize test environment(run the same
> >>> tests
> >>>>>>>>>     on different
> >>>>>>>>>      >>>>> JDK, JVM params, config, etc.)
> >>>>>>>>>      >>>>> * Isolation by default so system tests are as
> >>>>>>>>> reliable as
> >>>>>>>>>     possible.
> >>>>>>>>>      >>>>> * Utilities for pulling up and tearing down services
> >>> easily
> >>>>>>>>>     in clusters in
> >>>>>>>>>      >>>>> different environments (e.g. local, custom cluster,
> >>> Vagrant,
> >>>>>>>>>     K8s, Mesos,
> >>>>>>>>>      >>>>> Docker, cloud providers, etc.)
> >>>>>>>>>      >>>>> * Easy to write unit tests for distributed systems
> >>>>>>>>>      >>>>> * Adopted and successfully used by other distributed
> >>>>>>>>> open
> >>>>>>>>>     source project -
> >>>>>>>>>      >>>>> Apache Kafka.
> >>>>>>>>>      >>>>> * Collect results (e.g. logs, console output)
> >>>>>>>>>      >>>>> * Report results (e.g. expected conditions met,
> >>> performance
> >>>>>>>>>     results, etc.)
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> WDYT?
> >>>>>>>>>      >>>>>
> >>>>>>>>>      >>>>> [1] https://github.com/nizhikov/ignite/pull/15
> >>>>>>>>>      >>>>> [2] https://github.com/confluentinc/ducktape
> >>>>>>>>>      >>>>> [3]
> >>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
> >>>>>>>>>      >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> <2020-07-05--004.tar.gz>
> >>>
> >>>
> >>>
> >>
> >
>

Re: [DISCUSSION] Ignite integration testing framework.

Reply via email to