Hi Cleber,

I will try to be brief this time:

On 2020-05-21 02:32, Cleber Rosa wrote:
> Intro
> =====
> 
> This is a more technical follow up to the points given in a previous
> thread.  Because that thread and the current N(ext) Runner documentation
> for a good context for this proposal, I encourage everyone to read them
> first:
> 
>   https://www.redhat.com/archives/avocado-devel/2020-May/msg00009.html
> 
>   https://avocado-framework.readthedocs.io/en/79.0/future/core/nrunner.html
> 
> The N(ext) Runner allows for greater flexibility than the the current
> runner, so to be effective in delivering the N(ext) Runner for general
> usage, we must define the bare minimum that still needs to be
> implemented.

In fact I would prefer if we get more technical so that we get clearer mental
models about what we are talking about and what remains to be decided.

> Basic Job and Task execution
> ============================
> 
> An Task, within the context of the N(ext) Runner, is described as "one
> specific instance/occurrence of the execution of a runnable with its
> respective runner".
> 
> A Task is a very important building block for Avocado Job, and running
> an Avocado Job means, to a large extent, running a number of Tasks.
> The Tasks that need to be executed in a Job, are created during
> the ``create_test_suite()`` phase:
> 
>   
> https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.create_test_suite
> 
> And are kept in the Job's ``test_suite`` attribute:
> 
>   
> https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.test_suite

So I guess varianters are meant to be incorporated as resolvers providing
no longer simple test factories but resolutions wrapper into tasks. As we
already provide the support of, there is a possibility of a given reference
on the command line resulting in thousands of loaded/resolved tests/runnables.
It could reach the point where splitting tens of thousands of tests into
separate tasks might become a parallelism bottleneck. Have we considered a
possibility to wrap multiple tests or runnables into a single task? Is there
any good way to resolve this given that the task handling will be enforced
per runnable?

> Running the tests, then, happens during the ``run_tests()`` phase:
> 
>   
> https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.job.Job.run_tests
> 
> During the ``run_tests()`` phase, a plugin that run test suites on a
> job is chosen, based on the ``run.test_runner`` configuration.
> The current "work in progress" implementation for the N(ext) Runner,
> can be activated either by setting that configuration key to ``nrunner``,
> which can be easily done on the command line too::
> 
>   avocado run --test-runner=nrunner /bin/true
> 
> A general rule for measuring the quality and completeness of the
> ``nrunner`` implementation is to run the same jobs with the current
> runner, and compare its behavior and output with that of the
> ``nrunner``.  For here on, we'll call this simply the "nrunner
> plugin".

Then I guess custom runners that implement extra behavior (in my case for
instance a graph traversal of test nodes) should simply inherit from the
nrunner? Or perhaps the order and overall management on how and which tests
are run will no longer be controllable? Or should be delegated to classes
inheriting from only parts of the runner implementation like custom scheduler
and or spawners? In this case won't the settings for selection of runners become
redundant as we will always be able to select just one runner?

> Known issues and limitations of the current implementation
> ==========================================================
> 
> Different Test IDs
> ------------------
> 
> When running tests with the current runner, the Test IDs are different::
> 
>    $ avocado run --test-runner=runner --json=- -- /bin/true /bin/false 
> /bin/uname | grep \"id\"
>             "id": "1-/bin/true",
>             "id": "2-/bin/false",
>             "id": "3-/bin/uname",
> 
>    $ avocado run --test-runner=nrunner --json=- -- /bin/true /bin/false 
> /bin/uname | grep \"id\"
>             "id": "1-1-/bin/true",
>             "id": "2-2-/bin/false",
>             "id": "3-3-/bin/uname",
> 
> The goal is to make the IDs the same.

I guess this is only necessary for compatibility once the switch is done. Our 
custom
runner produces a different type of IDs that is really architecture specific 
and I
think the freedom to define the test IDs should remain to the developers at 
least as
long as they still have the freedom to resolve and control the running process 
of their
own test suites while hopefully keeping some of the large parallelism 
improvements above.
Is this the idea behind calling this a limitation or am I missing something? I 
saw no
explicit explanation why this is a problem so I guess it must be compatibility.

> Inability to run Tasks other than exec, exec-test, python-unittest (and noop)
> -----------------------------------------------------------------------------
> 
> The current implementation of the nrunner plugin is based on the fact that
> Tasks are already present at ``test_suite`` job attribute, and that running
> Tasks can be (but shouldn't always be) a matter of iterating of the result
> of its ``run()`` method.  This is part of the actual code::
> 
>     for status in task.run():
>       result_dispatcher.map_method('test_progress', False)
>       statuses.append(status)
> 
> The problem here is that only the Python classes implemented in the core
> "avocado.core.nrunner" module, and registered at:
> 
>   
> https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.html#avocado.core.nrunner.RUNNERS_REGISTRY_PYTHON_CLASS
> 
> The goal is to have all other Python classes that inherit from
> "avocado.core.nrunner.BaseRunner" available in such a registry.

I guess this relates on making task management flexible as the next to last 
sentence
there is not finished. If so then I guess this reduces to the custom test suite 
running
limitations I mentioned above.

> Inability to run Tasks with Spawners
> ------------------------------------
> 
> While the "avocado nrun" command makes use of the Spawners, the
> current implementation of the nrunner plugin described earlier,
> calls a Task's ``run()`` method directly, and clearly doesn't
> use spawners.
> 
> The goal here is to leverage spawners so that other isolation
> models (or execution environments, depending how you look at
> processes, containers, etc) are supported.

+1 on prioritizing this since the current blueprint is confusing with both 
tasks handled
directly and through spawners

> Unoptmized execution of Tasks (extra serialization/deserialization)
> -------------------------------------------------------------------
> 
> At this time, the nrunner plugin runs a Task directly through its
> ``run()`` method.  Besides the earlier point of not supporting
> other isolation models/execution environments (that means not using
> spawners), there's an extra layer of work happening when running
> a task which is most often not necessary: turning a Task instance
> into a command line, and within its execution, turning it into a
> Task instance again.
> 
> The goal is to support an optmized execution of the tasks, without
> having to turn them into command lines, and back into Task instances.
> The idea is already present in the spawning method definitions:
> 
>   
> https://avocado-framework.readthedocs.io/en/79.0/api/core/avocado.core.spawners.html#avocado.core.spawners.common.SpawnMethod.PYTHON_CLASS
> 
> And a PoC on top of the ``nrun`` command was implemented here:
> 
>   
> https://github.com/avocado-framework/avocado/pull/3766/commits/ae57ee78df7f2935e40394cdfc72a34b458cdcef

Speaking from the perspective of the past, what Autotest used to do to run code 
on
guest vms (or containers or anything assuming only just a remote address) was 
deploy
control files to the remote/precooked environment and run the control file. I 
still
preserved this in the "remote door" utility mentioned in the previous thread 
and in
addition added minimal metaprogramming capabilities (python scripts writing 
minimal
python scripts and a few others) and python remote object serialization through 
Pyro.

I mention this because I think it could be useful to take a look at some of 
these
approaches and reuse as much as possible from them. Some of them also offer 
optimizations
for more specific test scenarios.

> Task execution coordination goals
> ---------------------------------
> 
> As stated earlier, to run a job, tasks must be executed. Differently
> than the current runner, the N(ext) Runner architecture allows those
> to be executed in a much more decoupled way. This characteristic will
> be maintained, but it needs to be adapted into the current Job
> execution.
> 
> From a high level view, the nrunner plugin needs to:
> 
> 1. Break apart from the "one at a time" Task execution model that it
>    currently employs;
> 
> 2. Check if a Tasks can be executed, that is, if its requirements can
>    be fulfilled (the most basic requirement for a task is a matching
>    runner;
> 
> 3. Prepare for the execution of a task, such as the fulfillment of
>    extra tasks requirements. The requirements resolver is one, if not
>    the only way, component that should be given a chance to act here;

The requirements resolver should remain possible to inherit from and customize 
though.
The consideration of downloading all assets from the internet has security 
implications,
bandwidth overhead, and is simply not universal for all test dependency 
scenarios. So at
least keeping the door open here will no doubt be useful in the long term.

> 4. Executes a task in prepared environment;
> 
> 5. Monitor the execution of a task (from an external PoV);
> 
> 6. Collect the status messages that tasks will send;
> 
>    a. Forward the status messages to the appropriate job components,
>       such as the result plugins.
> 
>    b. Depending on the content of messages, such as the ones
>       containing "status: started" or "status: finished", interfere in
>       the Task execution status, and consequently, in the Job
>       execution status.
> 
> 7. Verify, warn the user, and attempt to clean up stray tasks.  This
>    may be for instance, necessary if a Task on a container seems to
>    be stuck, and the container can not be destroyed.  The same applies
>    to process in some time of uninterruptile sleeps.

+1

> Parallelization
> ---------------
> 
> Because the N(ext) Runner features allow for parallel execution of tasks,
> all other aspects of task execution coordination (fulfilling requirements,
> collecting results, etc) should not block each other.
> 
> There are a number of strategies for concurrent programming in Python
> these days, and the "avocado nrun" command currently makes use of
> asyncio to have coroutines that spawn tasks and collect results
> concurrently (in a cooperative preemptive model).  The actual language
> or library features used is, IMO, less important than the end result.

I think this requirement might be too strong. I am ware one could disable 
parallel runs and
go entirely sequentially but it is too strong as well. I think the most 
configurable approach
would be in the middle - if tasks could be allowed to have sequential asset or 
even mutual
dependencies and a scheduler could execute only tasks with currently satisfied 
requirements
this will be way more flexible.

> Suggested terminology
> ---------------------

+1 on all items

> Task workflow
> -------------
> ...
> Iteration I
> ~~~~~~~~~~~
> 
> Task #1 is selected on the first iteration, and it's found that:
> 
> 1. A suitable runner for tasks of kind ``python-unittest`` exists
> 
> 2. The ``mylib.py`` requirement is already present on the current
>    environment
> 
> 3. The ``gcc`` and ``libc-devel`` packages are not installed in the
>    current environment
> 
> 4. The system is capable of *attempting* to fulfill "package" types of
>    requirements.

I guess by a capable system you mean the system performing additional actions 
to modify
the state of the environment. Could there be any type of support for undoing  
such changes?
I guess if there are 1000 tasks with the same dependency, the first iteration 
will provide
it and the remaining 999 tasks will reuse it which is great but what if a later 
one would
require the environment to be brought back to the previous state? We make use 
of LVM to switch
and track the provision of snapshots that could be something useful to think 
about here for the
future. Such implementation would be faster than downloading/preparing new 
environments from
scratch.

> Tallying results
> ~~~~~~~~~~~~~~~~
> 
> The nrunner plugin should be able to provide meaningful results to the Job,
> and consequently to the user, based on the resulting information on the
> final iteration.
> 
> Notice that some information will come, such as the ``PASS`` for the
> first test, will come from the "result" given in a status message from
> the task itself.  Some other status, such as the ``INTERRUPTED``
> status for the second test will not come from a status message
> received, but from a realization of the actual management of the task
> execution.  It's expected to other information will also have to be
> inferred, and "filled in" by the nrunner plugin implementation
> 
> In the end, it's expected that results similar to this would be
> presented::
> 
>   JOB ID     : f59bd40b8ac905864c4558dc02b6177d4f422ca3
>   JOB LOG    : 
> /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/job.log
>    (1/2) tests.py:Test.test_2: PASS (2.56 s)
>    (2/2) tests.py:Test.test_1: INTERRUPT (900 s)
>   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | 
> CANCEL 0
>   JOB TIME   : 0.19 s
>   JOB HTML   : 
> /home/cleber/avocado/job-results/job-2020-05-20T17.58-f59bd40/results.html
> 
> Notice how Task #2 shows up before Task #1, because it was both started first,
> and finished earlier.  There may be issues associated with the current UI to
> dealt with regarding out of order task status updates.

Perhaps the status collection could regularly update itself and provide an 
ongoing view of
the number of collected PASS, ERROR, INTERRUPT, and so on tasks? If it provides 
such a view,
it could always sort the received statuses.

> Summary
> =======
> 
> This proposal contains a number of items that can become GitHub issues
> at this stage.  It also contains a general explanation of what I believe
> are the crucial missing features to make the N(ext) Runner implementation
> available to the general public.
> 
> Feedback is highly appreciated, and it's expected that this document will
> evolve into a better version, and possibly become a formal Blue Print.
> 
> Thanks,
> - Cleber.
> 

I hope you find some of my comments useful and are willing to provide some 
further comments so
that we could all contribute and coordinate on what is to become of such large 
reimplementation.

Best,
Plamen

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to