Re: [Avocado-devel] NRunner: decide on a "wire-format" for time/dates

Cleber Rosa Thu, 31 Oct 2019 09:25:01 -0700

On Tue, Oct 29, 2019 at 01:14:30PM -0300, Beraldo Leal wrote:
> Hi all,
> 
> So, we have a Trello card [1] to discuss what date/time format we are
> going to adopt when saving date/time on a file.
>


Hi Beraldo,

I don't think I meant that the date/time format to be discussed and
defined was meant to be primary saved on a file.  The mention of it
being used as a "wire-format" was my attempt to signal the primary
use.  But, let me make it start with a clearer definition of the
current state of the "nrunner" code.

The "avocado nrun" command is, right now, an ad-hoc implementation of
something similar to an Avocado job.  A loose definition of an Avocado
job's role is that it runs and collects results for one or more tests.
The closest thing to collecting results from many jobs there's in
nrunner right now is the status server[1], which waits for those
status messages on a TCP socket.  Those messages are currently encoded
as JSON, so the format of the date/time would has to be encoded in
either a JSON string or number.

Note: I'm already working on alternative implementations that
integrates the nrunner execution into the existing Avocado Job code, by
writing a "nrunner based" test runner implementation, whose interface
has now been defined[2] and it's used even by the regular runner[3].

The "nrunner" based runners, then, have the resposibility of publishing
relevant events, including test start and test end time.  It's this date/time
format that I'm most concerned with, because, once those are collected
by the results server (or job depending on the implementation) it can
certainly be stored or presented in an alternative format if it makes
sense to do so.

> I'm moving the discussion here because it seems better to discuss here
> than on Trello.
>

For sure!

> When it comes to date/time storage format, I can think of two very
> well-used standards: 1. Unix Time and 2. ISO 8601.
> 
> I’m in favor of the “disambiguation” feature. Read a date/time and not
> have to guess which timezone is a plus.
> 
> I think that few questions should be answered before we decide this:
> 
>   1. Is storage a problem?

I would certainly like to save a few bytes on each message that
contains a date/time, provided everything else is equal.

But, to be honest, I don't think reading a JSON number as a date (say
for Unix time) or a string (say for ISO 8601) would have a signficant
impact on the transmission/processing/storage costs.  I think if we
come to the point of needing to optmize the communication, a more
comprehensive change, such as replacing the protocol/encoding
altogether would probably yield the best results.

>   2. Is a CPU bound problem to parse this date/time?

Like I said before, I doubt that the "status server" would have its
CPU pressured just for parsing the date/time, no matter the format.  I
think it's more important that the test runner is given as little work
as possible, though, so that it causes as little disturbance as
possible on the test and on the tested system.  Think of low powered
embedded systems running a test, for instance.  Being able to use a
native data type and cheap encoding would be favorable IMO.

>   3. Who is going to read this information? Machine or human?
>

Initially the "raw" info is machine readable, even though most people
would agree that JSON is quite human readable.  When it comes to the
date/time format itself, a Unix time has poor human readability.

> I believe that by answering these questions, we can go smoothly with
> one format or another, as all languages have libraries to handle it.
>

Agreed.  I hope I was able to give my general impression on the
requirements above and answered those points.

> I have listed below the advantages and disadvantages that I have been
> able to collect so far. Feel free to add or comment about.
> 
> # Unix Time / Posix Time / Epoch Time
> ## Advantages:
>   * Better for machine readability;
>   * Optimized for storage;
>   * Very well-known with builtin libraries in many languages;
> 
> ## Disadvantages:
>   * No timezone support (assumes UTC);
>   * Leap seconds are ignored;

That was news to me.  After reading an article[4] I think it doesn't
impact our use case.

>   * Cannot store values before “1970-01-01 00:00:00 UTC”;

Shouldn't be a problem, as we're not supposed to store tests started
or that have ended before that. :) 

>   * On 32-bit systems there is the “Year 2038 problem”;

This is trickier... and I hate to feel cornered by it.  Even if, to
the best of my knowledge and assumptions, we won't be dealing with
32-bit systems by then, or, the problem would have been solved /
worked around at another layer.

<joke>TBH, you shouldn't had mentioned this!</joke>

> 
> ## Examples using Unix Time:
>   * 915148800.25
>   * 1095379201.00
>

The presentation aspect is really what bothers me, which is in direct
conflict with the fact that the primary consumers of the nrunner
messages are not humans.  But, given that one can easily see that output
by running, say, "avocado runnable-run ...", I was bothered by it.

Anyway, I'm going to dismiss those feelings on the basis of the
primary use cases.

> # ISO 8601
> ## Advantages:
>   * Better for human readability;

For sure.

>   * Very well-known international standard with builtin libraries in
> many languages;
>      (First edition in 1988 and updated until nowadays);
>   * UTC time zone can be represented by only one “Z” char;

Interesting.

>   * The lexicographical order of the representation thus corresponds
> to chronological order;

Also interesting.

>     (except for date representations involving negative years or time offset);
>   * A fraction may be added to the lowest order time element in the
> representation.
>     (A decimal mark, either a comma or a dot can be used);
>   * There is no limit on the number of decimal places for the decimal 
> fraction;

Does this mean that a very high time resolution can be used?  This was
one of the questions/concerns I had on the back of my mind...

>   * Has support for a basic format (without - or : ) and an extended
> format with separators added to enhance human readability
>   (The standard notes that: "The basic format should be avoided in
> plain text.");
> 
> ## Disadvantages:
>   * Needs more time to parse (not so optimal for machine parsing);

True, but as I've said before, I think the cost of producing it is
more important than the cost of parsing it (as the results server
should have much more resources than the test runner).

>   * Needs more space to store;
>

True... for instance, Python's time.time() gives me:

   >>> len(json.dumps(time.time()))
   18

While for ISO 8601 with  

   >>> 
len(json.dumps(datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()))
   34

> ## Examples using ISO 8601:
>   * 2019-10-29T11:22:32+00:00
>   * 2019-10-29T11:22:32Z
>   * 20191029T112232Z
>

I like the last example a lot, but that is the one suggested by the
standard notes to not be used, right?

> If the answers to questions 1 and 2 are "no", I think that I would go
> with ISO 8601 using 'Z' as UTC timezone, always.
> 
> And you? Any thoughts? Do you have a third option?

I think those two are the real contenders indeed.  I'm wondering if
both formats shouldn't be supported by the status server when reading
the messages, so that the writing of native runners would be
facilitated and the load on them would be minimized.

For the runners producing UNIX times, we could even have something like:

 $ avocado runnable-run ... | ./contrib/scripts/avocado-beautify-status-messages

In the best UNIX tradition.

Thanks for the thorought analisys!
- Cleber.

> 
> [1] - https://trello.com/c/w4iFhDfM
> 
> Regards,
> -- 
> Beraldo Leal
> Senior Software Engineer, Virtualization Team
> Red Hat
> 

[1] 
https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/nrunner.py#L522
[2] 
https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/plugin_interfaces.py#L290
[3] 
https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/setup.py#L128
[4] https://derickrethans.nl/leap-seconds-and-what-to-do-with-them.html

Re: [Avocado-devel] NRunner: decide on a "wire-format" for time/dates

Reply via email to