Discussed privately with Max. Discussion results provided at the slack channel [1].
[1] https://the-asf.slack.com/archives/C016F4PS8KV/p1595336751234500 On Wed, Jul 15, 2020 at 3:59 PM Max Shonichev <mshon...@yandex.ru> wrote: > Anton, Nikolay, > > I want to share some more findings about ducktests I've stubmled upon > during porting them to Tiden. > > > First problem was that GridGain Tiden-based tests by default use real > production-like configuration for Ignite nodes, notably: > > - persitence enabled > - ~120 caches in ~40 groups > - data set size around 1M keys per each cache > - primitive and PoJo cache values > - extensive use of query entities (indices) > > When I've tried to run 4 nodes with such configuration in docker, my > notebook nearly burns. Nevertheless, grid was starting and working OK, > but for one little 'but': each successive version under test was > starting slower and slower. > > The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and your > fork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes within > default 120 seconds timeout. In order to mimick behavior of your tests I > had to turn off persistence and use only 1 cache too. > > It's a pity that you completely ignore persistence and indices in your > ducktests, otherwise you would quickly stuck into same limitation. > > I hope in the nearest time I would adopt Tiden docker PoC to our > TeamCity and we'll try to git-bisect in order to find where this > slowdown comes from. After that I'll file a bug to IGNITE Jira. > > > > Another problem with your rebalance benchmark is it's low accuracy due > to granularity of measurements. > > You don't actually measure rebalance time, you measure time that takes > you to find a specific string in logs, that's confusing. > > The scenario of your test is as follows: > > 1. start 3 server nodes > 2. start 1 data loading client, preload a data, stop client > 3. start 1 more server node > 4. wait till server joins topology > 5. wait till this server node completes exchange and write > 'rebalanced=true, wasRebalanced=false' message to log > 6. report time was taken by step 5 as 'Rebalance time' > > Confusing thing here is that 'wait till' implementation - you actually > continuously re-scan logs sleeping each second and wait till message > appear. So, that means that rebalance time is at least of second > granularity or even higher, though it is reported with nanosecond > precision. > > But for such lightweight configuration (single in-memory cache) and such > small set of data (1M keys only), rebalancing is very fast, and usually > performs under 1 second or just slightly slower. > > Before waiting for rebalance message you first wait for topology message > and that wait also takes time to execute. > > So, at the time Python part of the test performs first scan of the logs, > rebalancing is in most cases already done and time you report as > '0.0760810375213623' is actually the time to execute logs scanning code. > > However, if rebalancing perform just a little bit slower after topology > update, then first scan of logs is failed, you sleep for whole one > second and rescan logs and there you got your message and report it as > '1.02205491065979'. > > Under different conditions, dockerized application may run a little > slower or a little faster, that depends on overall system load, free > memory, etc. I've tried to increase load on my laptop by running browser > or maven build, and time to scan logs may fluctuate from 0.02 to 0.09 or > even 1.02 seconds. Note, that in CI environment, high system load from > tenants is a quite ordinary situation. > > Suppose we adopted rebalance improvements and all versions after 2.9.0 > would perform within 1 second just as 2.9.0 itself. Then your benchmark > would either report false negative (e.g. 0.02 for master and 0.03 for > PR), while actually on next re-run it would pass (e.g. 0.07 for master > and 0.03 for PR). That's not quite the 'stable and non-flacky' test > Ignite community wants. > > What suggestions do you have to improve benchmark measurement accuracy? > > > A third question is about PME free switch benchmark. Under some > conditions, LongTxStreamerApplication actually hangs up PME. It need to > be investigated further, but either this was due to persistence enabled > or due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false > > Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option? > Also, have you had performed a test of PME free switch with > persistence-enabled caches? > > > On 09.07.2020 10:11, Max Shonichev wrote: > > Anton, > > > > well, strange thing, but clean up and rerun helped. > > > > > > Ubuntu 18.04 > > > > > ==================================================================================================== > > > > > SESSION REPORT (ALL TESTS) > > ducktape version: 0.7.7 > > session_id: 2020-07-06--003 > > run time: 4 minutes 44.835 seconds > > tests run: 5 > > passed: 5 > > failed: 0 > > ignored: 0 > > > ==================================================================================================== > > > > > test_id: > > > ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 > > > > > status: PASS > > run time: 41.927 seconds > > {"Rebalanced in (sec)": 1.02205491065979} > > > ---------------------------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev > > > > > status: PASS > > run time: 51.985 seconds > > {"Rebalanced in (sec)": 0.0760810375213623} > > > ---------------------------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 > > > > > status: PASS > > run time: 1 minute 4.283 seconds > > {"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst > > latency (ms)": "31035"} > > > ---------------------------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev > > > > > status: PASS > > run time: 1 minute 13.089 seconds > > {"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst > > latency (ms)": "139"} > > > ---------------------------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client > > > > > status: PASS > > run time: 53.332 seconds > > > ---------------------------------------------------------------------------------------------------- > > > > > > > > > MacBook > > > ================================================================================ > > > > > SESSION REPORT (ALL TESTS) > > ducktape version: 0.7.7 > > session_id: 2020-07-06--001 > > run time: 6 minutes 58.612 seconds > > tests run: 5 > > passed: 5 > > failed: 0 > > ignored: 0 > > > ================================================================================ > > > > > test_id: > > > ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 > > > > > status: PASS > > run time: 48.724 seconds > > {"Rebalanced in (sec)": 3.2574470043182373} > > > -------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev > > > > > status: PASS > > run time: 1 minute 23.210 seconds > > {"Rebalanced in (sec)": 2.165921211242676} > > > -------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 > > > > > status: PASS > > run time: 1 minute 12.659 seconds > > {"Streamed txs": "642", "Measure duration (ms)": "33177", "Worst latency > > (ms)": "31063"} > > > -------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev > > > > > status: PASS > > run time: 1 minute 57.257 seconds > > {"Streamed txs": "32924", "Measure duration (ms)": "48252", "Worst > > latency (ms)": "1010"} > > > -------------------------------------------------------------------------------- > > > > > test_id: > > > ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client > > > > > status: PASS > > run time: 1 minute 36.317 seconds > > > > ============= > > > > while relative numbers proportion remains the same for different Ignite > > versions, absolute number for mac/linux differ more then twice. > > > > I'm finalizing code with 'local Tiden' appliance for your tests. PR > > would be ready soon. > > > > Have you had a chance to deploy ducktests in bare metal? > > > > > > > > On 06.07.2020 14:27, Anton Vinogradov wrote: > >> Max, > >> > >> Thanks for the check! > >> > >>> Is it OK for those tests to fail? > >> No. > >> I see really strange things at logs. > >> Looks like you have concurrent ducktests run started not expected > >> services, > >> and this broke the tests. > >> Could you please clean up the docker (use clean-up script [1]). > >> Compile sources (use script [2]) and rerun the tests. > >> > >> [1] > >> > https://github.com/anton-vinogradov/ignite/blob/dc98ee9df90b25eb5d928090b0e78b48cae2392e/modules/ducktests/tests/docker/clean_up.sh > >> > >> [2] > >> > https://github.com/anton-vinogradov/ignite/blob/3c39983005bd9eaf8cb458950d942fb592fff85c/scripts/build.sh > >> > >> > >> On Mon, Jul 6, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org> > >> wrote: > >> > >>> Hello, Maxim. > >>> > >>> Thanks for writing down the minutes. > >>> > >>> There is no such thing as «Nikolay team» on the dev-list. > >>> I propose to focus on product requirements and what we want to gain > from > >>> the framework instead of taking into account the needs of some team. > >>> > >>> Can you, please, write down your version of requirements so we can > >>> reach a > >>> consensus on that and therefore move to the discussion of the > >>> implementation? > >>> > >>>> 6 июля 2020 г., в 11:18, Max Shonichev <mshon...@yandex.ru> > написал(а): > >>>> > >>>> Yes, Denis, > >>>> > >>>> common ground seems to be as follows: > >>>> Anton Vinogradov and Nikolay Izhikov would try to prepare and run PoC > >>> over physical hosts and share benchmark results. In the meantime, > >>> while I > >>> strongly believe that dockerized approach to benchmarking is a road to > >>> misleading and false positives, I'll prepare a PoC of Tiden in > >>> dockerized > >>> environment to support 'fast development prototyping' usecase Nikolay > >>> team > >>> insist on. It should be a matter of few days. > >>>> > >>>> As a side note, I've run Anton PoC locally and would like to have some > >>> comments about results: > >>>> > >>>> Test system: Ubuntu 18.04, docker 19.03.6 > >>>> Test commands: > >>>> > >>>> > >>>> git clone -b ignite-ducktape g...@github.com: > anton-vinogradov/ignite.git > >>>> cd ignite > >>>> mvn clean install -DskipTests -Dmaven.javadoc.skip=true > >>> -Pall-java,licenses,lgpl,examples,!spark-2.4,!spark,!scala > >>>> cd modules/ducktests/tests/docker > >>>> ./run_tests.sh > >>>> > >>>> Test results: > >>>> > >>> > ==================================================================================================== > > >>> > >>>> SESSION REPORT (ALL TESTS) > >>>> ducktape version: 0.7.7 > >>>> session_id: 2020-07-05--004 > >>>> run time: 7 minutes 36.360 seconds > >>>> tests run: 5 > >>>> passed: 3 > >>>> failed: 2 > >>>> ignored: 0 > >>>> > >>> > ==================================================================================================== > > >>> > >>>> test_id: > >>> > ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 > > >>> > >>>> status: FAIL > >>>> run time: 3 minutes 12.232 seconds > >>>> > >>> > ---------------------------------------------------------------------------------------------------- > > >>> > >>>> test_id: > >>> > ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 > > >>> > >>>> status: FAIL > >>>> run time: 1 minute 33.076 seconds > >>>> > >>>> > >>>> Is it OK for those tests to fail? Attached is full test report > >>>> > >>>> > >>>> On 02.07.2020 17:46, Denis Magda wrote: > >>>>> Folks, > >>>>> Please share the summary of that Slack conversation here for records > >>> once > >>>>> you find common ground. > >>>>> - > >>>>> Denis > >>>>> On Thu, Jul 2, 2020 at 3:22 AM Nikolay Izhikov <nizhi...@apache.org> > >>> wrote: > >>>>>> Igniters. > >>>>>> > >>>>>> All who are interested in integration testing framework discussion > >>>>>> are > >>>>>> welcome into slack channel - > >>>>>> > >>> > https://join.slack.com/share/zt-fk2ovehf-TcomEAwiXaPzLyNKZbmfzw?cdn_fallback=2 > >>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> 2 июля 2020 г., в 13:06, Anton Vinogradov <a...@apache.org> > >>>>>>> написал(а): > >>>>>>> > >>>>>>> Max, > >>>>>>> Thanks for joining us. > >>>>>>> > >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on > >>>>>>>> dependencies being deployed by external scripts. > >>>>>>> No. It is important to distinguish development, deploy, and > >>>>>> orchestration. > >>>>>>> All-in-one solutions have extremely limited usability. > >>>>>>> As to Ducktests: > >>>>>>> Docker is responsible for deployments during development. > >>>>>>> CI/CD is responsible for deployments during release and nightly > >>> checks. > >>>>>> It's up to the team to chose AWS, VM, BareMetal, and even OS. > >>>>>>> Ducktape is responsible for orchestration. > >>>>>>> > >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel > >>>>>> fashion, > >>>>>>>> while ducktape internally does all actions sequentially. > >>>>>>> No. Ducktape may start any service in parallel. See Pme-free > >>>>>>> benchmark > >>>>>> [1] for details. > >>>>>>> > >>>>>>>> if we used ducktape solution we would have to instead prepare some > >>>>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, > >>> with > >>>>>>>> Ansible or Chef. > >>>>>>> Sure, because a way of deploy depends on infrastructure. > >>>>>>> How can we be sure that OS we use and the restrictions we have > >>>>>>> will be > >>>>>> compatible with Tiden? > >>>>>>> > >>>>>>>> You have solved this deficiency with docker by putting all > >>> dependencies > >>>>>>>> into one uber-image ... > >>>>>>> and > >>>>>>>> I guess we all know about docker hyped ability to run over > >>> distributed > >>>>>>>> virtual networks. > >>>>>>> It is very important not to confuse the test's development (docker > >>> image > >>>>>> you're talking about) and real deployment. > >>>>>>> > >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape does > >>>>>>> All actions can be performed in parallel. > >>>>>>> See how Ducktests [2] starts cluster in parallel for example. > >>>>>>> > >>>>>>> [1] > >>>>>> > >>> > https://github.com/apache/ignite/pull/7967/files#diff-59adde2a2ab7dc17aea6c65153dfcda7R84 > >>> > >>>>>>> [2] > >>>>>> > >>> > https://github.com/apache/ignite/pull/7967/files#diff-d6a7b19f30f349d426b8894a40389cf5R79 > >>> > >>>>>>> > >>>>>>> On Thu, Jul 2, 2020 at 1:00 PM Nikolay Izhikov < > nizhi...@apache.org> > >>>>>> wrote: > >>>>>>> Hello, Maxim. > >>>>>>> > >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on > >>>>>> dependencies being deployed by external scripts > >>>>>>> > >>>>>>> Why do you think that maintaining deploy scripts coupled with the > >>>>>> testing framework is an advantage? > >>>>>>> I thought we want to see and maintain deployment scripts separate > >>>>>>> from > >>>>>> the testing framework. > >>>>>>> > >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel > >>>>>> fashion, while ducktape internally does all actions sequentially. > >>>>>>> > >>>>>>> Can you, please, clarify, what actions do you have in mind? > >>>>>>> And why we want to execute them concurrently? > >>>>>>> Ignite node start, Client application execution can be done > >>> concurrently > >>>>>> with the ducktape approach. > >>>>>>> > >>>>>>>> If we used ducktape solution we would have to instead prepare some > >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, > >>>>>> with > >>>>>> Ansible or Chef > >>>>>>> > >>>>>>> We shouldn’t take some user approach as an argument in this > >>> discussion. > >>>>>> Let’s discuss a general approach for all users of the Ignite. > Anyway, > >>> what > >>>>>> is wrong with the external deployment script approach? > >>>>>>> > >>>>>>> We, as a community, should provide several ways to run integration > >>> tests > >>>>>> out-of-the-box AND the ability to customize deployment regarding the > >>> user > >>>>>> landscape. > >>>>>>> > >>>>>>>> You have solved this deficiency with docker by putting all > >>>>>> dependencies into one uber-image and that looks like simple and > >>>>>> elegant > >>>>>> solution however, that effectively limits you to single-host > testing. > >>>>>>> > >>>>>>> Docker image should be used only by the Ignite developers to test > >>>>>> something locally. > >>>>>>> It’s not intended for some real-world testing. > >>>>>>> > >>>>>>> The main issue with the Tiden that I see, it tested and > >>>>>>> maintained as > >>> a > >>>>>> closed source solution. > >>>>>>> This can lead to the hard to solve problems when we start using and > >>>>>> maintaining it as an open-source solution. > >>>>>>> Like, how many developers used Tiden? And how many of developers > >>>>>>> were > >>>>>> not authors of the Tiden itself? > >>>>>>> > >>>>>>> > >>>>>>>> 2 июля 2020 г., в 12:30, Max Shonichev <mshon...@yandex.ru> > >>>>>> написал(а): > >>>>>>>> > >>>>>>>> Anton, Nikolay, > >>>>>>>> > >>>>>>>> Let's agree on what we are arguing about: whether it is about > "like > >>> or > >>>>>> don't like" or about technical properties of suggested solutions. > >>>>>>>> > >>>>>>>> If it is about likes and dislikes, then the whole discussion is > >>>>>> meaningless. However, I hope together we can analyse pros and cons > >>>>>> carefully. > >>>>>>>> > >>>>>>>> As far as I can understand now, two main differences between > >>>>>>>> ducktape > >>>>>> and tiden is that: > >>>>>>>> > >>>>>>>> 1. tiden can deploy artifacts by itself, while ducktape relies on > >>>>>> dependencies being deployed by external scripts. > >>>>>>>> > >>>>>>>> 2. tiden can execute actions over remote nodes in real parallel > >>>>>> fashion, while ducktape internally does all actions sequentially. > >>>>>>>> > >>>>>>>> As for me, these are very important properties for distributed > >>> testing > >>>>>> framework. > >>>>>>>> > >>>>>>>> First property let us easily reuse tiden in existing > >>>>>>>> infrastructures, > >>>>>> for example, during Zookeeper IEP testing at Sberbank site we used > >>>>>> the > >>> same > >>>>>> tiden scripts that we use in our lab, the only change was putting a > >>> list of > >>>>>> hosts into config. > >>>>>>>> > >>>>>>>> If we used ducktape solution we would have to instead prepare some > >>>>>> deployment scripts to pre-initialize Sberbank hosts, for example, > >>>>>> with > >>>>>> Ansible or Chef. > >>>>>>>> > >>>>>>>> > >>>>>>>> You have solved this deficiency with docker by putting all > >>>>>> dependencies into one uber-image and that looks like simple and > >>>>>> elegant > >>>>>> solution, > >>>>>>>> however, that effectively limits you to single-host testing. > >>>>>>>> > >>>>>>>> I guess we all know about docker hyped ability to run over > >>> distributed > >>>>>> virtual networks. We used to go that way, but quickly found that > >>>>>> it is > >>> more > >>>>>> of the hype than real work. In real environments, there are problems > >>> with > >>>>>> routing, DNS, multicast and broadcast traffic, and many others, that > >>> turn > >>>>>> docker-based distributed solution into a fragile hard-to-maintain > >>> monster. > >>>>>>>> > >>>>>>>> Please, if you believe otherwise, perform a run of your PoC over > at > >>>>>> least two physical hosts and share results with us. > >>>>>>>> > >>>>>>>> If you consider that one physical docker host is enough, please, > >>> don't > >>>>>> overlook that we want to run real scale scenarios, with 50-100 cache > >>>>>> groups, persistence enabled and a millions of keys loaded. > >>>>>>>> > >>>>>>>> Practical limit for such configurations is 4-6 nodes per single > >>>>>> physical host. Otherwise, tests become flaky due to resource > >>> starvation. > >>>>>>>> > >>>>>>>> Please, if you believe otherwise, perform at least a 10 of runs of > >>>>>> your PoC with other tests running at TC (we're targeting TeamCity, > >>> right?) > >>>>>> and share results so we could check if the numbers are reproducible. > >>>>>>>> > >>>>>>>> I stress this once more: functional integration tests are OK to > run > >>> in > >>>>>> Docker and CI, but running benchmarks in Docker is a big NO GO. > >>>>>>>> > >>>>>>>> > >>>>>>>> Second property let us write tests that require real-parallel > >>>>>>>> actions > >>>>>> over hosts. > >>>>>>>> > >>>>>>>> For example, agreed scenario for PME benchmarkduring "PME > >>> optimization > >>>>>> stream" was as follows: > >>>>>>>> > >>>>>>>> - 10 server nodes, preloaded with 1M of keys > >>>>>>>> - 4 client nodes perform transactional load (client nodes > >>> physically > >>>>>> separated from server nodes) > >>>>>>>> - during load: > >>>>>>>> -- 5 server nodes stopped in parallel > >>>>>>>> -- after 1 minute, all 5 nodes are started in parallel > >>>>>>>> - load stopped, logs are analysed for exchange times. > >>>>>>>> > >>>>>>>> If we had stopped and started 5 nodes one-by-one, as ducktape > does, > >>>>>> then partition map exchange merge would not happen and we could not > >>> have > >>>>>> measured PME optimizations for that case. > >>>>>>>> > >>>>>>>> > >>>>>>>> These are limitations of ducktape that we believe as a more > >>>>>>>> important > >>>>>>>> argument "against" than you provide "for". > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 30.06.2020 14:58, Anton Vinogradov wrote: > >>>>>>>>> Folks, > >>>>>>>>> First, I've created PR [1] with ducktests improvements > >>>>>>>>> PR contains the following changes > >>>>>>>>> - Pme-free switch proof-benchmark (2.7.6 vs master) > >>>>>>>>> - Ability to check (compare with) previous releases (eg. 2.7.6 & > >>> 2.8) > >>>>>>>>> - Global refactoring > >>>>>>>>> -- benchmarks javacode simplification > >>>>>>>>> -- services python and java classes code deduplication > >>>>>>>>> -- fail-fast checks for java and python (eg. application should > >>>>>> explicitly write it finished with success) > >>>>>>>>> -- simple results extraction from tests and benchmarks > >>>>>>>>> -- javacode now configurable from tests/benchmarks > >>>>>>>>> -- proper SIGTERM handling at javacode (eg. it may finish last > >>>>>> operation and log results) > >>>>>>>>> -- docker volume now marked as delegated to increase execution > >>>>>>>>> speed > >>>>>> for mac & win users > >>>>>>>>> -- Ignite cluster now start in parallel (start speed-up) > >>>>>>>>> -- Ignite can be configured at test/benchmark > >>>>>>>>> - full and module assembly scripts added > >>>>>>>> Great job done! But let me remind one of Apache Ignite principles: > >>>>>>>> week of thinking save months of development. > >>>>>>>> > >>>>>>>> > >>>>>>>>> Second, I'd like to propose to accept ducktests [2] (ducktape > >>>>>> integration) as a target "PoC check & real topology benchmarking > >>>>>> tool". > >>>>>>>>> Ducktape pros > >>>>>>>>> - Developed for distributed system by distributed system > >>>>>>>>> developers. > >>>>>>>> So does Tiden > >>>>>>>> > >>>>>>>>> - Developed since 2014, stable. > >>>>>>>> Tiden is also pretty stable, and development start date is not a > >>>>>>>> good > >>>>>> argument, for example pytest is since 2004, pytest-xdist (plugin for > >>>>>> distributed testing) is since 2010, but we don't see it as a > >>> alternative at > >>>>>> all. > >>>>>>>> > >>>>>>>>> - Proven usability by usage at Kafka. > >>>>>>>> Tiden is proven usable by usage at GridGain and Sberbank > >>>>>>>> deployments. > >>>>>>>> Core, storage, sql and tx teams use benchmark results provided by > >>>>>> Tiden on a daily basis. > >>>>>>>> > >>>>>>>>> - Dozens of dozens tests and benchmarks at Kafka as a great > >>>>>>>>> example > >>>>>> pack. > >>>>>>>> We'll donate some of our suites to Ignite as I've mentioned in > >>>>>> previous letter. > >>>>>>>> > >>>>>>>>> - Built-in Docker support for rapid development and checks. > >>>>>>>> False, there's no specific 'docker support' in ducktape itself, > you > >>>>>> just wrap it in docker by yourself, because ducktape is lacking > >>> deployment > >>>>>> abilities. > >>>>>>>> > >>>>>>>>> - Great for CI automation. > >>>>>>>> False, there's no specific CI-enabled features in ducktape. > >>>>>>>> Tiden, on > >>>>>> the other hand, provide generic xUnit reporting format, which is > >>> supported > >>>>>> by both TeamCity and Jenkins. Also, instead of using private keys, > >>> Tiden > >>>>>> can use SSH agent, which is also great for CI, because both > >>>>>>>> TeamCity and Jenkins store keys in secret storage available only > >>>>>>>> for > >>>>>> ssh-agent and only for the time of the test. > >>>>>>>> > >>>>>>>> > >>>>>>>>>> As an additional motivation, at least 3 teams > >>>>>>>>> - IEP-45 team (to check crash-recovery speed-up (discovery and > >>> Zabbix > >>>>>> speed-up)) > >>>>>>>>> - Ignite SE Plugins team (to check plugin's features does not > >>>>>> slow-down or broke AI features) > >>>>>>>>> - Ignite SE QA team (to append already developed > >>>>>>>>> smoke/load/failover > >>>>>> tests to AI codebase) > >>>>>>>> > >>>>>>>> Please, before recommending your tests to other teams, provide > >>>>>>>> proofs > >>>>>>>> that your tests are reproducible in real environment. > >>>>>>>> > >>>>>>>> > >>>>>>>>> now, wait for ducktest merge to start checking cases they > >>>>>>>>> working on > >>>>>> in AI way. > >>>>>>>>> Thoughts? > >>>>>>>> Let us together review both solutions, we'll try to run your > >>>>>>>> tests in > >>>>>> our lab, and you'll try to at least checkout tiden and see if same > >>> tests > >>>>>> can be implemented with it? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> [1] https://github.com/apache/ignite/pull/7967 > >>>>>>>>> [2] https://github.com/apache/ignite/tree/ignite-ducktape > >>>>>>>>> On Tue, Jun 16, 2020 at 12:22 PM Nikolay Izhikov < > >>> nizhi...@apache.org > >>>>>> <mailto:nizhi...@apache.org>> wrote: > >>>>>>>>> Hello, Maxim. > >>>>>>>>> Thank you for so detailed explanation. > >>>>>>>>> Can we put the content of this discussion somewhere on the > >>>>>>>>> wiki? > >>>>>>>>> So It doesn’t get lost. > >>>>>>>>> I divide the answer in several parts. From the requirements > to > >>> the > >>>>>>>>> implementation. > >>>>>>>>> So, if we agreed on the requirements we can proceed with the > >>>>>>>>> discussion of the implementation. > >>>>>>>>> 1. Requirements: > >>>>>>>>> The main goal I want to achieve is *reproducibility* of the > >>> tests. > >>>>>>>>> I’m sick and tired with the zillions of flaky, rarely > >>>>>>>>> failed, and > >>>>>>>>> almost never failed tests in Ignite codebase. > >>>>>>>>> We should start with the simplest scenarios that will be as > >>>>>> reliable > >>>>>>>>> as steel :) > >>>>>>>>> I want to know for sure: > >>>>>>>>> - Is this PR makes rebalance quicker or not? > >>>>>>>>> - Is this PR makes PME quicker or not? > >>>>>>>>> So, your description of the complex test scenario looks as > >>>>>>>>> a next > >>>>>>>>> step to me. > >>>>>>>>> Anyway, It’s cool we already have one. > >>>>>>>>> The second goal is to have a strict test lifecycle as we > >>>>>>>>> have in > >>>>>>>>> JUnit and similar frameworks. > >>>>>>>>> > It covers production-like deployment and running a > >>>>>>>>> scenarios > >>>>>> over > >>>>>>>>> a single database instance. > >>>>>>>>> Do you mean «single cluster» or «single host»? > >>>>>>>>> 2. Existing tests: > >>>>>>>>> > A Combinator suite allows to run set of operations > >>> concurrently > >>>>>>>>> over given database instance. > >>>>>>>>> > A Consumption suite allows to run a set production-like > >>> actions > >>>>>>>>> over given set of Ignite/GridGain versions and compare test > >>> metrics > >>>>>>>>> across versions > >>>>>>>>> > A Yardstick suite > >>>>>>>>> > A Stress suite that simulates hardware environment > >>>>>>>>> degradation > >>>>>>>>> > An Ultimate, DR and Compatibility suites that performs > >>>>>> functional > >>>>>>>>> regression testing > >>>>>>>>> > Regression > >>>>>>>>> Great news that we already have so many choices for testing! > >>>>>>>>> Mature test base is a big +1 for Tiden. > >>>>>>>>> 3. Comparison: > >>>>>>>>> > Criteria: Test configuration > >>>>>>>>> > Ducktape: single JSON string for all tests > >>>>>>>>> > Tiden: any number of YaML config files, command line > option > >>> for > >>>>>>>>> fine-grained test configuration, ability to select/modify > >>>>>>>>> tests > >>>>>>>>> behavior based on Ignite version. > >>>>>>>>> 1. Many YAML files can be hard to maintain. > >>>>>>>>> 2. In ducktape, you can set parameters via «—parameters» > >>>>>>>>> option. > >>>>>>>>> Please, take a look at the doc [1] > >>>>>>>>> > Criteria: Cluster control > >>>>>>>>> > Tiden: additionally can address cluster as a whole and > >>>>>>>>> execute > >>>>>>>>> remote commands in parallel. > >>>>>>>>> It seems we implement this ability in the PoC, already. > >>>>>>>>> > Criteria: Test assertions > >>>>>>>>> > Tiden: simple asserts, also few customized assertion > >>>>>>>>> helpers. > >>>>>>>>> > Ducktape: simple asserts. > >>>>>>>>> Can you, please, be more specific. > >>>>>>>>> What helpers do you have in mind? > >>>>>>>>> Ducktape has an asserts that waits for logfile messages or > >>>>>>>>> some > >>>>>>>>> process finish. > >>>>>>>>> > Criteria: Test reporting > >>>>>>>>> > Ducktape: limited to its own text/HTML format > >>>>>>>>> Ducktape have > >>>>>>>>> 1. Text reporter > >>>>>>>>> 2. Customizable HTML reporter > >>>>>>>>> 3. JSON reporter. > >>>>>>>>> We can show JSON with the any template or tool. > >>>>>>>>> > Criteria: Provisioning and deployment > >>>>>>>>> > Ducktape: can provision subset of hosts from cluster for > >>>>>>>>> test > >>>>>>>>> needs. However, that means, that test can’t be scaled without > >>> test > >>>>>>>>> code changes. Does not do any deploy, relies on external > >>>>>>>>> means, > >>>>>> e.g. > >>>>>>>>> pre-packaged in docker image, as in PoC. > >>>>>>>>> This is not true. > >>>>>>>>> 1. We can set explicit test parameters(node number) via > >>> parameters. > >>>>>>>>> We can increase client count of cluster size without test > code > >>>>>> changes. > >>>>>>>>> 2. We have many choices for the test environment. These > >>>>>>>>> choices > >>> are > >>>>>>>>> tested and used in other projects: > >>>>>>>>> * docker > >>>>>>>>> * vagrant > >>>>>>>>> * private cloud(ssh access) > >>>>>>>>> * ec2 > >>>>>>>>> Please, take a look at Kafka documentation [2] > >>>>>>>>> > I can continue more on this, but it should be enough for > >>>>>>>>> now: > >>>>>>>>> We need to go deeper! :) > >>>>>>>>> [1] > >>>>>>>>> > >>>>>> > https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options > >>>>>>>>> [2] > >>>>>> https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart > >>>>>>>>> > 9 июня 2020 г., в 17:25, Max A. Shonichev > >>>>>>>>> <mshon...@yandex.ru > >>>>>>>>> <mailto:mshon...@yandex.ru>> написал(а): > >>>>>>>>> > > >>>>>>>>> > Greetings, Nikolay, > >>>>>>>>> > > >>>>>>>>> > First of all, thank you for you great effort preparing > >>>>>>>>> PoC of > >>>>>>>>> integration testing to Ignite community. > >>>>>>>>> > > >>>>>>>>> > It’s a shame Ignite did not have at least some such > >>>>>>>>> tests yet, > >>>>>>>>> however, GridGain, as a major contributor to Apache Ignite > >>>>>>>>> had a > >>>>>>>>> profound collection of in-house tools to perform > >>>>>>>>> integration and > >>>>>>>>> performance testing for years already and while we slowly > >>> consider > >>>>>>>>> sharing our expertise with the community, your initiative > >>>>>>>>> makes > >>> us > >>>>>>>>> drive that process a bit faster, thanks a lot! > >>>>>>>>> > > >>>>>>>>> > I reviewed your PoC and want to share a little about > >>>>>>>>> what we > >>> do > >>>>>>>>> on our part, why and how, hope it would help community take > >>> proper > >>>>>>>>> course. > >>>>>>>>> > > >>>>>>>>> > First I’ll do a brief overview of what decisions we made > >>>>>>>>> and > >>>>>> what > >>>>>>>>> we do have in our private code base, next I’ll describe > >>>>>>>>> what we > >>>>>> have > >>>>>>>>> already donated to the public and what we plan public next, > >>>>>>>>> then > >>>>>>>>> I’ll compare both approaches highlighting deficiencies in > >>>>>>>>> order > >>> to > >>>>>>>>> spur public discussion on the matter. > >>>>>>>>> > > >>>>>>>>> > It might seem strange to use Python to run Bash to run > Java > >>>>>>>>> applications because that introduces IT industry best of > >>>>>>>>> breed’ – > >>>>>>>>> the Python dependency hell – to the Java application code > >>>>>>>>> base. > >>> The > >>>>>>>>> only strangest decision one can made is to use Maven to run > >>> Docker > >>>>>>>>> to run Bash to run Python to run Bash to run Java, but > >>>>>>>>> desperate > >>>>>>>>> times call for desperate measures I guess. > >>>>>>>>> > > >>>>>>>>> > There are Java-based solutions for integration testing > >>>>>>>>> exists, > >>>>>>>>> e.g. Testcontainers [1], Arquillian [2], etc, and they > >>>>>>>>> might go > >>>>>> well > >>>>>>>>> for Ignite community CI pipelines by them selves. But we also > >>>>>> wanted > >>>>>>>>> to run performance tests and benchmarks, like the dreaded PME > >>>>>>>>> benchmark, and this is solved by totally different set of > >>>>>>>>> tools > >>> in > >>>>>>>>> Java world, e.g. Jmeter [3], OpenJMH [4], Gatling [5], etc. > >>>>>>>>> > > >>>>>>>>> > Speaking specifically about benchmarking, Apache Ignite > >>>>>> community > >>>>>>>>> already has Yardstick [6], and there’s nothing wrong with > >>>>>>>>> writing > >>>>>>>>> PME benchmark using Yardstick, but we also wanted to be > >>>>>>>>> able to > >>> run > >>>>>>>>> scenarios like this: > >>>>>>>>> > - put an X load to a Ignite database; > >>>>>>>>> > - perform an Y set of operations to check how Ignite copes > >>> with > >>>>>>>>> operations under load. > >>>>>>>>> > > >>>>>>>>> > And yes, we also wanted applications under test be > deployed > >>>>>> ‘like > >>>>>>>>> in a production’, e.g. distributed over a set of hosts. This > >>> arises > >>>>>>>>> questions about provisioning and nodes affinity which I’ll > >>>>>>>>> cover > >>> in > >>>>>>>>> detail later. > >>>>>>>>> > > >>>>>>>>> > So we decided to put a little effort to build a simple > >>>>>>>>> tool to > >>>>>>>>> cover different integration and performance scenarios, and > >>>>>>>>> our QA > >>>>>>>>> lab first attempt was PoC-Tester [7], currently open source > >>>>>>>>> for > >>> all > >>>>>>>>> but for reporting web UI. It’s a quite simple to use 95% > >>> Java-based > >>>>>>>>> tool targeted to be run on a pre-release QA stage. > >>>>>>>>> > > >>>>>>>>> > It covers production-like deployment and running a > >>>>>>>>> scenarios > >>>>>> over > >>>>>>>>> a single database instance. PoC-Tester scenarios consists of > a > >>>>>>>>> sequence of tasks running sequentially or in parallel. > >>>>>>>>> After all > >>>>>>>>> tasks complete, or at any time during test, user can run logs > >>>>>>>>> collection task, logs are checked against exceptions and a > >>> summary > >>>>>>>>> of found issues and task ops/latency statistics is > >>>>>>>>> generated at > >>> the > >>>>>>>>> end of scenario. One of the main PoC-Tester features is its > >>>>>>>>> fire-and-forget approach to task managing. That is, you can > >>> deploy > >>>>>> a > >>>>>>>>> grid and left it running for weeks, periodically firing some > >>> tasks > >>>>>>>>> onto it. > >>>>>>>>> > > >>>>>>>>> > During earliest stages of PoC-Tester development it > becomes > >>>>>> quite > >>>>>>>>> clear that Java application development is a tedious > >>>>>>>>> process and > >>>>>>>>> architecture decisions you take during development are slow > >>>>>>>>> and > >>>>>> hard > >>>>>>>>> to change. > >>>>>>>>> > For example, scenarios like this > >>>>>>>>> > - deploy two instances of GridGain with master-slave data > >>>>>>>>> replication configured; > >>>>>>>>> > - put a load on master; > >>>>>>>>> > - perform checks on slave, > >>>>>>>>> > or like this: > >>>>>>>>> > - preload a 1Tb of data by using your favorite tool of > >>>>>>>>> choice > >>> to > >>>>>>>>> an Apache Ignite of version X; > >>>>>>>>> > - run a set of functional tests running Apache Ignite > >>>>>>>>> version > >>> Y > >>>>>>>>> over preloaded data, > >>>>>>>>> > do not fit well in the PoC-Tester workflow. > >>>>>>>>> > > >>>>>>>>> > So, this is why we decided to use Python as a generic > >>> scripting > >>>>>>>>> language of choice. > >>>>>>>>> > > >>>>>>>>> > Pros: > >>>>>>>>> > - quicker prototyping and development cycles > >>>>>>>>> > - easier to find DevOps/QA engineer with Python skills > than > >>> one > >>>>>>>>> with Java skills > >>>>>>>>> > - used extensively all over the world for DevOps/CI > >>>>>>>>> pipelines > >>>>>> and > >>>>>>>>> thus has rich set of libraries for all possible integration > >>>>>>>>> uses > >>>>>> cases. > >>>>>>>>> > > >>>>>>>>> > Cons: > >>>>>>>>> > - Nightmare with dependencies. Better stick to specific > >>>>>>>>> language/libraries version. > >>>>>>>>> > > >>>>>>>>> > Comparing alternatives for Python-based testing > >>>>>>>>> framework we > >>>>>> have > >>>>>>>>> considered following requirements, somewhat similar to what > >>> you’ve > >>>>>>>>> mentioned for Confluent [8] previously: > >>>>>>>>> > - should be able run locally or distributed (bare metal > >>>>>>>>> or in > >>>>>> the > >>>>>>>>> cloud) > >>>>>>>>> > - should have built-in deployment facilities for > >>>>>>>>> applications > >>>>>>>>> under test > >>>>>>>>> > - should separate test configuration and test code > >>>>>>>>> > -- be able to easily reconfigure tests by simple > >>>>>>>>> configuration > >>>>>>>>> changes > >>>>>>>>> > -- be able to easily scale test environment by simple > >>>>>>>>> configuration changes > >>>>>>>>> > -- be able to perform regression testing by simple > >>>>>>>>> switching > >>>>>>>>> artifacts under test via configuration > >>>>>>>>> > -- be able to run tests with different JDK version by > >>>>>>>>> simple > >>>>>>>>> configuration changes > >>>>>>>>> > - should have human readable reports and/or reporting > tools > >>>>>>>>> integration > >>>>>>>>> > - should allow simple test progress monitoring, one does > >>>>>>>>> not > >>>>>> want > >>>>>>>>> to run 6-hours test to find out that application actually > >>>>>>>>> crashed > >>>>>>>>> during first hour. > >>>>>>>>> > - should allow parallel execution of test actions > >>>>>>>>> > - should have clean API for test writers > >>>>>>>>> > -- clean API for distributed remote commands execution > >>>>>>>>> > -- clean API for deployed applications start / stop and > >>>>>>>>> other > >>>>>>>>> operations > >>>>>>>>> > -- clean API for performing check on results > >>>>>>>>> > - should be open source or at least source code should > >>>>>>>>> allow > >>>>>> ease > >>>>>>>>> change or extension > >>>>>>>>> > > >>>>>>>>> > Back at that time we found no better alternative than to > >>>>>>>>> write > >>>>>>>>> our own framework, and here goes Tiden [9] as GridGain > >>>>>>>>> framework > >>> of > >>>>>>>>> choice for functional integration and performance testing. > >>>>>>>>> > > >>>>>>>>> > Pros: > >>>>>>>>> > - solves all the requirements above > >>>>>>>>> > Cons (for Ignite): > >>>>>>>>> > - (currently) closed GridGain source > >>>>>>>>> > > >>>>>>>>> > On top of Tiden we’ve built a set of test suites, some of > >>> which > >>>>>>>>> you might have heard already. > >>>>>>>>> > > >>>>>>>>> > A Combinator suite allows to run set of operations > >>> concurrently > >>>>>>>>> over given database instance. Proven to find at least 30+ > race > >>>>>>>>> conditions and NPE issues. > >>>>>>>>> > > >>>>>>>>> > A Consumption suite allows to run a set production-like > >>> actions > >>>>>>>>> over given set of Ignite/GridGain versions and compare test > >>> metrics > >>>>>>>>> across versions, like heap/disk/CPU consumption, time to > >>>>>>>>> perform > >>>>>>>>> actions, like client PME, server PME, rebalancing time, data > >>>>>>>>> replication time, etc. > >>>>>>>>> > > >>>>>>>>> > A Yardstick suite is a thin layer of Python glue code to > >>>>>>>>> run > >>>>>>>>> Apache Ignite pre-release benchmarks set. Yardstick itself > >>>>>>>>> has a > >>>>>>>>> mediocre deployment capabilities, Tiden solves this easily. > >>>>>>>>> > > >>>>>>>>> > A Stress suite that simulates hardware environment > >>>>>>>>> degradation > >>>>>>>>> during testing. > >>>>>>>>> > > >>>>>>>>> > An Ultimate, DR and Compatibility suites that performs > >>>>>> functional > >>>>>>>>> regression testing of GridGain Ultimate Edition features like > >>>>>>>>> snapshots, security, data replication, rolling upgrades, etc. > >>>>>>>>> > > >>>>>>>>> > A Regression and some IEPs testing suites, like IEP-14, > >>> IEP-15, > >>>>>>>>> etc, etc, etc. > >>>>>>>>> > > >>>>>>>>> > Most of the suites above use another in-house developed > >>>>>>>>> Java > >>>>>> tool > >>>>>>>>> – PiClient – to perform actual loading and miscellaneous > >>> operations > >>>>>>>>> with Ignite under test. We use py4j Python-Java gateway > >>>>>>>>> library > >>> to > >>>>>>>>> control PiClient instances from the tests. > >>>>>>>>> > > >>>>>>>>> > When we considered CI, we put TeamCity out of scope, > >>>>>>>>> because > >>>>>>>>> distributed integration and performance tests tend to run for > >>> hours > >>>>>>>>> and TeamCity agents are scarce and costly resource. So, > >>>>>>>>> bundled > >>>>>> with > >>>>>>>>> Tiden there is jenkins-job-builder [10] based CI pipelines > and > >>>>>>>>> Jenkins xUnit reporting. Also, rich web UI tool Ward > >>>>>>>>> aggregates > >>>>>> test > >>>>>>>>> run reports across versions and has built in visualization > >>> support > >>>>>>>>> for Combinator suite. > >>>>>>>>> > > >>>>>>>>> > All of the above is currently closed source, but we plan > to > >>> make > >>>>>>>>> it public for community, and publishing Tiden core [9] is the > >>> first > >>>>>>>>> step on that way. You can review some examples of using > >>>>>>>>> Tiden for > >>>>>>>>> tests at my repository [11], for start. > >>>>>>>>> > > >>>>>>>>> > Now, let’s compare Ducktape PoC and Tiden. > >>>>>>>>> > > >>>>>>>>> > Criteria: Language > >>>>>>>>> > Tiden: Python, 3.7 > >>>>>>>>> > Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 > >>>>>>>>> compatible, but actually can’t work with Python 3.7 due to > >>>>>>>>> broken > >>>>>>>>> Zmq dependency. > >>>>>>>>> > Comment: Python 3.7 has a much better support for > >>>>>>>>> async-style > >>>>>>>>> code which might be crucial for distributed application > >>>>>>>>> testing. > >>>>>>>>> > Score: Tiden: 1, Ducktape: 0 > >>>>>>>>> > > >>>>>>>>> > Criteria: Test writers API > >>>>>>>>> > Supported integration test framework concepts are > basically > >>> the > >>>>>> same: > >>>>>>>>> > - a test controller (test runner) > >>>>>>>>> > - a cluster > >>>>>>>>> > - a node > >>>>>>>>> > - an application (a service in Ducktape terms) > >>>>>>>>> > - a test > >>>>>>>>> > Score: Tiden: 5, Ducktape: 5 > >>>>>>>>> > > >>>>>>>>> > Criteria: Tests selection and run > >>>>>>>>> > Ducktape: suite-package-class-method level selection, > >>>>>>>>> internal > >>>>>>>>> scheduler allows to run tests in suite in parallel. > >>>>>>>>> > Tiden: also suite-package-class-method level selection, > >>>>>>>>> additionally allows selecting subset of tests by attribute, > >>>>>> parallel > >>>>>>>>> runs not built in, but allows merging test reports after > >>> different > >>>>>> runs. > >>>>>>>>> > Score: Tiden: 2, Ducktape: 2 > >>>>>>>>> > > >>>>>>>>> > Criteria: Test configuration > >>>>>>>>> > Ducktape: single JSON string for all tests > >>>>>>>>> > Tiden: any number of YaML config files, command line > option > >>> for > >>>>>>>>> fine-grained test configuration, ability to select/modify > >>>>>>>>> tests > >>>>>>>>> behavior based on Ignite version. > >>>>>>>>> > Score: Tiden: 3, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Cluster control > >>>>>>>>> > Ducktape: allow execute remote commands by node > granularity > >>>>>>>>> > Tiden: additionally can address cluster as a whole and > >>>>>>>>> execute > >>>>>>>>> remote commands in parallel. > >>>>>>>>> > Score: Tiden: 2, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Logs control > >>>>>>>>> > Both frameworks have similar builtin support for remote > >>>>>>>>> logs > >>>>>>>>> collection and grepping. Tiden has built-in plugin that can > >>>>>>>>> zip, > >>>>>>>>> collect arbitrary log files from arbitrary locations at > >>>>>>>>> test/module/suite granularity and unzip if needed, also > >>> application > >>>>>>>>> API to search / wait for messages in logs. Ducktape allows > >>>>>>>>> each > >>>>>>>>> service declare its log files location (seemingly does not > >>> support > >>>>>>>>> logs rollback), and a single entrypoint to collect service > >>>>>>>>> logs. > >>>>>>>>> > Score: Tiden: 1, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Test assertions > >>>>>>>>> > Tiden: simple asserts, also few customized assertion > >>>>>>>>> helpers. > >>>>>>>>> > Ducktape: simple asserts. > >>>>>>>>> > Score: Tiden: 2, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Test reporting > >>>>>>>>> > Ducktape: limited to its own text/html format > >>>>>>>>> > Tiden: provides text report, yaml report for reporting > >>>>>>>>> tools > >>>>>>>>> integration, XML xUnit report for integration with > >>>>>> Jenkins/TeamCity. > >>>>>>>>> > Score: Tiden: 3, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Provisioning and deployment > >>>>>>>>> > Ducktape: can provision subset of hosts from cluster for > >>>>>>>>> test > >>>>>>>>> needs. However, that means, that test can’t be scaled without > >>> test > >>>>>>>>> code changes. Does not do any deploy, relies on external > >>>>>>>>> means, > >>>>>> e.g. > >>>>>>>>> pre-packaged in docker image, as in PoC. > >>>>>>>>> > Tiden: Given a set of hosts, Tiden uses all of them for > the > >>>>>> test. > >>>>>>>>> Provisioning should be done by external means. However, > >>>>>>>>> provides > >>> a > >>>>>>>>> conventional automated deployment routines. > >>>>>>>>> > Score: Tiden: 1, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > Criteria: Documentation and Extensibility > >>>>>>>>> > Tiden: current API documentation is limited, should > >>>>>>>>> change as > >>> we > >>>>>>>>> go open source. Tiden is easily extensible via hooks and > >>>>>>>>> plugins, > >>>>>>>>> see example Maven plugin and Gatling application at [11]. > >>>>>>>>> > Ducktape: basic documentation at readthedocs.io > >>>>>>>>> <http://readthedocs.io>. Codebase is rigid, framework core > is > >>>>>>>>> tightly coupled and hard to change. The only possible > >>>>>>>>> extension > >>>>>>>>> mechanism is fork-and-rewrite. > >>>>>>>>> > Score: Tiden: 2, Ducktape: 1 > >>>>>>>>> > > >>>>>>>>> > I can continue more on this, but it should be enough for > >>>>>>>>> now: > >>>>>>>>> > Overall score: Tiden: 22, Ducktape: 14. > >>>>>>>>> > > >>>>>>>>> > Time for discussion! > >>>>>>>>> > > >>>>>>>>> > --- > >>>>>>>>> > [1] - https://www.testcontainers.org/ > >>>>>>>>> > [2] - http://arquillian.org/guides/getting_started/ > >>>>>>>>> > [3] - https://jmeter.apache.org/index.html > >>>>>>>>> > [4] - https://openjdk.java.net/projects/code-tools/jmh/ > >>>>>>>>> > [5] - https://gatling.io/docs/current/ > >>>>>>>>> > [6] - https://github.com/gridgain/yardstick > >>>>>>>>> > [7] - https://github.com/gridgain/poc-tester > >>>>>>>>> > [8] - > >>>>>>>>> > >>>>>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements > >>> > >>>>>>>>> > [9] - https://github.com/gridgain/tiden > >>>>>>>>> > [10] - https://pypi.org/project/jenkins-job-builder/ > >>>>>>>>> > [11] - https://github.com/mshonichev/tiden_examples > >>>>>>>>> > > >>>>>>>>> > On 25.05.2020 11:09, Nikolay Izhikov wrote: > >>>>>>>>> >> Hello, > >>>>>>>>> >> > >>>>>>>>> >> Branch with duck tape created - > >>>>>>>>> https://github.com/apache/ignite/tree/ignite-ducktape > >>>>>>>>> >> > >>>>>>>>> >> Any who are willing to contribute to PoC are welcome. > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >>> 21 мая 2020 г., в 22:33, Nikolay Izhikov > >>>>>>>>> <nizhikov....@gmail.com <mailto:nizhikov....@gmail.com>> > >>>>>> написал(а): > >>>>>>>>> >>> > >>>>>>>>> >>> Hello, Denis. > >>>>>>>>> >>> > >>>>>>>>> >>> There is no rush with these improvements. > >>>>>>>>> >>> We can wait for Maxim proposal and compare two > >>>>>>>>> solutions :) > >>>>>>>>> >>> > >>>>>>>>> >>>> 21 мая 2020 г., в 22:24, Denis Magda < > dma...@apache.org > >>>>>>>>> <mailto:dma...@apache.org>> написал(а): > >>>>>>>>> >>>> > >>>>>>>>> >>>> Hi Nikolay, > >>>>>>>>> >>>> > >>>>>>>>> >>>> Thanks for kicking off this conversation and sharing > >>>>>>>>> your > >>>>>>>>> findings with the > >>>>>>>>> >>>> results. That's the right initiative. I do agree that > >>> Ignite > >>>>>>>>> needs to have > >>>>>>>>> >>>> an integration testing framework with capabilities > >>>>>>>>> listed > >>> by > >>>>>> you. > >>>>>>>>> >>>> > >>>>>>>>> >>>> As we discussed privately, I would only check if > >>>>>>>>> instead of > >>>>>>>>> >>>> Confluent's Ducktape library, we can use an integration > >>>>>>>>> testing framework > >>>>>>>>> >>>> developed by GridGain for testing of Ignite/GridGain > >>>>>> clusters. > >>>>>>>>> That > >>>>>>>>> >>>> framework has been battle-tested and might be more > >>>>>> convenient for > >>>>>>>>> >>>> Ignite-specific workloads. Let's wait for @Maksim > >>>>>>>>> Shonichev > >>>>>>>>> >>>> <mshonic...@gridgain.com > >>>>>>>>> <mailto:mshonic...@gridgain.com>> > >>>>>> who > >>>>>>>>> promised to join this thread once he finishes > >>>>>>>>> >>>> preparing the usage examples of the framework. To my > >>>>>>>>> knowledge, Max has > >>>>>>>>> >>>> already been working on that for several days. > >>>>>>>>> >>>> > >>>>>>>>> >>>> - > >>>>>>>>> >>>> Denis > >>>>>>>>> >>>> > >>>>>>>>> >>>> > >>>>>>>>> >>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov > >>>>>>>>> <nizhi...@apache.org <mailto:nizhi...@apache.org>> > >>>>>>>>> >>>> wrote: > >>>>>>>>> >>>> > >>>>>>>>> >>>>> Hello, Igniters. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> I created a PoC [1] for the integration tests of > >>>>>>>>> Ignite. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> Let me briefly explain the gap I want to cover: > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> 1. For now, we don’t have a solution for automated > >>>>>>>>> testing > >>>>>> of > >>>>>>>>> Ignite on > >>>>>>>>> >>>>> «real cluster». > >>>>>>>>> >>>>> By «real cluster» I mean cluster «like a production»: > >>>>>>>>> >>>>> * client and server nodes deployed on different > >>> hosts. > >>>>>>>>> >>>>> * thin clients perform queries from some other > >>>>>>>>> hosts > >>>>>>>>> >>>>> * etc. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> 2. We don’t have a solution for automated benchmarks > of > >>> some > >>>>>>>>> internal > >>>>>>>>> >>>>> Ignite process > >>>>>>>>> >>>>> * PME > >>>>>>>>> >>>>> * rebalance. > >>>>>>>>> >>>>> This means we don’t know - Do we perform > >>>>>>>>> rebalance(or PME) > >>>>>> in > >>>>>>>>> 2.7.0 faster > >>>>>>>>> >>>>> or slower than in 2.8.0 for the same cluster? > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> 3. We don’t have a solution for automated testing of > >>> Ignite > >>>>>>>>> integration in > >>>>>>>>> >>>>> a real-world environment: > >>>>>>>>> >>>>> Ignite-Spark integration can be taken as an example. > >>>>>>>>> >>>>> I think some ML solutions also should be tested in > >>>>>> real-world > >>>>>>>>> deployments. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> Solution: > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> I propose to use duck tape library from confluent > >>>>>>>>> (apache > >>>>>> 2.0 > >>>>>>>>> license) > >>>>>>>>> >>>>> I tested it both on the real cluster(Yandex Cloud) > >>>>>>>>> and on > >>>>>> the > >>>>>>>>> local > >>>>>>>>> >>>>> environment(docker) and it works just fine. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> PoC contains following services: > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> * Simple rebalance test: > >>>>>>>>> >>>>> Start 2 server nodes, > >>>>>>>>> >>>>> Create some data with Ignite client, > >>>>>>>>> >>>>> Start one more server node, > >>>>>>>>> >>>>> Wait for rebalance finish > >>>>>>>>> >>>>> * Simple Ignite-Spark integration test: > >>>>>>>>> >>>>> Start 1 Spark master, start 1 Spark > >>>>>>>>> worker, > >>>>>>>>> >>>>> Start 1 Ignite server node > >>>>>>>>> >>>>> Create some data with Ignite client, > >>>>>>>>> >>>>> Check data in application that queries > it > >>> from > >>>>>>>>> Spark. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> All tests are fully automated. > >>>>>>>>> >>>>> Logs collection works just fine. > >>>>>>>>> >>>>> You can see an example of the tests report - [4]. > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> Pros: > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> * Ability to test local changes(no need to public > >>>>>>>>> changes > >>> to > >>>>>>>>> some remote > >>>>>>>>> >>>>> repository or similar). > >>>>>>>>> >>>>> * Ability to parametrize test environment(run the same > >>> tests > >>>>>>>>> on different > >>>>>>>>> >>>>> JDK, JVM params, config, etc.) > >>>>>>>>> >>>>> * Isolation by default so system tests are as > >>>>>>>>> reliable as > >>>>>>>>> possible. > >>>>>>>>> >>>>> * Utilities for pulling up and tearing down services > >>> easily > >>>>>>>>> in clusters in > >>>>>>>>> >>>>> different environments (e.g. local, custom cluster, > >>> Vagrant, > >>>>>>>>> K8s, Mesos, > >>>>>>>>> >>>>> Docker, cloud providers, etc.) > >>>>>>>>> >>>>> * Easy to write unit tests for distributed systems > >>>>>>>>> >>>>> * Adopted and successfully used by other distributed > >>>>>>>>> open > >>>>>>>>> source project - > >>>>>>>>> >>>>> Apache Kafka. > >>>>>>>>> >>>>> * Collect results (e.g. logs, console output) > >>>>>>>>> >>>>> * Report results (e.g. expected conditions met, > >>> performance > >>>>>>>>> results, etc.) > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> WDYT? > >>>>>>>>> >>>>> > >>>>>>>>> >>>>> [1] https://github.com/nizhikov/ignite/pull/15 > >>>>>>>>> >>>>> [2] https://github.com/confluentinc/ducktape > >>>>>>>>> >>>>> [3] > >>>>>> https://ducktape-docs.readthedocs.io/en/latest/run_tests.html > >>>>>>>>> >>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>> <2020-07-05--004.tar.gz> > >>> > >>> > >>> > >> > > >