On Tue, Oct 29, 2019 at 01:14:30PM -0300, Beraldo Leal wrote: > Hi all, > > So, we have a Trello card [1] to discuss what date/time format we are > going to adopt when saving date/time on a file. >
Hi Beraldo, I don't think I meant that the date/time format to be discussed and defined was meant to be primary saved on a file. The mention of it being used as a "wire-format" was my attempt to signal the primary use. But, let me make it start with a clearer definition of the current state of the "nrunner" code. The "avocado nrun" command is, right now, an ad-hoc implementation of something similar to an Avocado job. A loose definition of an Avocado job's role is that it runs and collects results for one or more tests. The closest thing to collecting results from many jobs there's in nrunner right now is the status server[1], which waits for those status messages on a TCP socket. Those messages are currently encoded as JSON, so the format of the date/time would has to be encoded in either a JSON string or number. Note: I'm already working on alternative implementations that integrates the nrunner execution into the existing Avocado Job code, by writing a "nrunner based" test runner implementation, whose interface has now been defined[2] and it's used even by the regular runner[3]. The "nrunner" based runners, then, have the resposibility of publishing relevant events, including test start and test end time. It's this date/time format that I'm most concerned with, because, once those are collected by the results server (or job depending on the implementation) it can certainly be stored or presented in an alternative format if it makes sense to do so. > I'm moving the discussion here because it seems better to discuss here > than on Trello. > For sure! > When it comes to date/time storage format, I can think of two very > well-used standards: 1. Unix Time and 2. ISO 8601. > > I’m in favor of the “disambiguation” feature. Read a date/time and not > have to guess which timezone is a plus. > > I think that few questions should be answered before we decide this: > > 1. Is storage a problem? I would certainly like to save a few bytes on each message that contains a date/time, provided everything else is equal. But, to be honest, I don't think reading a JSON number as a date (say for Unix time) or a string (say for ISO 8601) would have a signficant impact on the transmission/processing/storage costs. I think if we come to the point of needing to optmize the communication, a more comprehensive change, such as replacing the protocol/encoding altogether would probably yield the best results. > 2. Is a CPU bound problem to parse this date/time? Like I said before, I doubt that the "status server" would have its CPU pressured just for parsing the date/time, no matter the format. I think it's more important that the test runner is given as little work as possible, though, so that it causes as little disturbance as possible on the test and on the tested system. Think of low powered embedded systems running a test, for instance. Being able to use a native data type and cheap encoding would be favorable IMO. > 3. Who is going to read this information? Machine or human? > Initially the "raw" info is machine readable, even though most people would agree that JSON is quite human readable. When it comes to the date/time format itself, a Unix time has poor human readability. > I believe that by answering these questions, we can go smoothly with > one format or another, as all languages have libraries to handle it. > Agreed. I hope I was able to give my general impression on the requirements above and answered those points. > I have listed below the advantages and disadvantages that I have been > able to collect so far. Feel free to add or comment about. > > # Unix Time / Posix Time / Epoch Time > ## Advantages: > * Better for machine readability; > * Optimized for storage; > * Very well-known with builtin libraries in many languages; > > ## Disadvantages: > * No timezone support (assumes UTC); > * Leap seconds are ignored; That was news to me. After reading an article[4] I think it doesn't impact our use case. > * Cannot store values before “1970-01-01 00:00:00 UTC”; Shouldn't be a problem, as we're not supposed to store tests started or that have ended before that. :) > * On 32-bit systems there is the “Year 2038 problem”; This is trickier... and I hate to feel cornered by it. Even if, to the best of my knowledge and assumptions, we won't be dealing with 32-bit systems by then, or, the problem would have been solved / worked around at another layer. <joke>TBH, you shouldn't had mentioned this!</joke> > > ## Examples using Unix Time: > * 915148800.25 > * 1095379201.00 > The presentation aspect is really what bothers me, which is in direct conflict with the fact that the primary consumers of the nrunner messages are not humans. But, given that one can easily see that output by running, say, "avocado runnable-run ...", I was bothered by it. Anyway, I'm going to dismiss those feelings on the basis of the primary use cases. > # ISO 8601 > ## Advantages: > * Better for human readability; For sure. > * Very well-known international standard with builtin libraries in > many languages; > (First edition in 1988 and updated until nowadays); > * UTC time zone can be represented by only one “Z” char; Interesting. > * The lexicographical order of the representation thus corresponds > to chronological order; Also interesting. > (except for date representations involving negative years or time offset); > * A fraction may be added to the lowest order time element in the > representation. > (A decimal mark, either a comma or a dot can be used); > * There is no limit on the number of decimal places for the decimal > fraction; Does this mean that a very high time resolution can be used? This was one of the questions/concerns I had on the back of my mind... > * Has support for a basic format (without - or : ) and an extended > format with separators added to enhance human readability > (The standard notes that: "The basic format should be avoided in > plain text."); > > ## Disadvantages: > * Needs more time to parse (not so optimal for machine parsing); True, but as I've said before, I think the cost of producing it is more important than the cost of parsing it (as the results server should have much more resources than the test runner). > * Needs more space to store; > True... for instance, Python's time.time() gives me: >>> len(json.dumps(time.time())) 18 While for ISO 8601 with >>> len(json.dumps(datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat())) 34 > ## Examples using ISO 8601: > * 2019-10-29T11:22:32+00:00 > * 2019-10-29T11:22:32Z > * 20191029T112232Z > I like the last example a lot, but that is the one suggested by the standard notes to not be used, right? > If the answers to questions 1 and 2 are "no", I think that I would go > with ISO 8601 using 'Z' as UTC timezone, always. > > And you? Any thoughts? Do you have a third option? I think those two are the real contenders indeed. I'm wondering if both formats shouldn't be supported by the status server when reading the messages, so that the writing of native runners would be facilitated and the load on them would be minimized. For the runners producing UNIX times, we could even have something like: $ avocado runnable-run ... | ./contrib/scripts/avocado-beautify-status-messages In the best UNIX tradition. Thanks for the thorought analisys! - Cleber. > > [1] - https://trello.com/c/w4iFhDfM > > Regards, > -- > Beraldo Leal > Senior Software Engineer, Virtualization Team > Red Hat > [1] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/nrunner.py#L522 [2] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/plugin_interfaces.py#L290 [3] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/setup.py#L128 [4] https://derickrethans.nl/leap-seconds-and-what-to-do-with-them.html