Thanks for the detailed summary.

I agree with the goal of using the same time source across all tests
if possible. Originally, I was hoping this would be 'builtin', but
that'd require deploying MiniChronyd too, both to avoid a dependency
on Internet connectivity and to ensure that clock synchronization at
test startup is sufficiently fast. And, like you said, not only is
that a good chunk of work, it's also somewhat antithetical to those
pure "in-proc" test setups.

So I reluctantly agree that 'system_unsync' may be the next best
option, with targeted uses of 'system' and 'builtin' for testing
specific functionality. That said, is it safe that 'system_unsync'
uses CLOCK_REALTIME under the hood, which is affected by NTP and admin
time changes?

On Sun, Oct 6, 2019 at 11:27 PM Alexey Serbin
<aser...@cloudera.com.invalid> wrote:
>
> Hi,
>
> Recently, built-in NTP client was been introduced in Kudu.  Todd Lipcon
> put together the original WIP patch a couple of years ago [1], and about a
> week
> ago the built-in NTP client was merged into the master branch of the Kudu
> repo.
> With that, a new time source is now available for Kudu masters and tablet
> servers: the built-in NTP client.
>
> With the introduction of the new time source for Kudu components, we have
> had a few offline discussions about using different time sources in Kudu
> tests.  That's not only about providing test coverage for the newly
> introduced
> built-in NTP client, but for all other tests as well.  Last week Adar and I
> talked about that offline one more time.  I'll try to summarize few key
> points in this e-mail message.
>
> With the introduction of the built-in NTP client, a significant part of
> tests
> was switched to run with it as the time source.  Particularly, all tests
> based on ExternalMiniCluster are now run with built-in NTP client with
> commit
> 03e2ada69 [2].  The idea is to have more coverage for the newly added
> functionality, especially given the fact that at some point we might switch
> to use the built-in NTP client by default instead of relying on the local
> machine clock synchronized by the system NTP service.
>
> There are many other Kudu tests (e.g., tests based on InternalMiniCluster)
> which still require the machine clock to be synchronized with NTP to run.
> Ideally, it would be great to run all Kudu tests with using same time source
> unless they are specifically targeted to verify functionality of a
> particular
> time source.  An example of the latter are tests which cover the
> functionality
> of the built-in NTP client itself.
>
> Prior to the introduction of the built-in NTP client, almost all Kudu tests
> were running against the local system clock synchronized with NTP, at least
> on Linux.  The only exception were tests for read-your-writes and other
> consistency guarantees: those use mock time source to control the time
> in a very specific way.
>
> In retrospective, it was great to have a single time source because we
> always
> knew that the same time source was used for almost all of our tests.  Also,
> the time source used in Kudu tests was the same as in Kudu masters and
> tablet
> servers running in real distributed clusters.  From the other side, that
> required local machine's clock to be synchronized with NTP to run those
> tests,
> and the tests would fail if the time was not synchronized with NTP.
> In addition, as far as I know, there are no Kudu tests which are targeted
> to detect inconsistencies due to irregularities in the time source or
> non-synchronized clocks between different Kudu nodes.  Of course, we have
> Jepsen-based tests, but that's another story: we are not talking about
> Jepsen
> tests here, only C++ and Java tests based on gtest and JUnit frameworks.
>
> Now, here comes one observation: all the components of above mentioned tests
> are running at the same machine during execution of corresponding scenarios.
> If all of them are using the local machine's clock, then there is no need
> to use NTP or other synchronisation techniques designed to synchronize
> multiple clocks.
>
> So, here comes the question: why to require the local clock to be
> synchronized
> with NTP for tests?  Yes, we need to verify that the 'system' time source
> works for HybridTime timestamps, but as for the functionality of a generic
> Kudu test which is simply based on the HybridTime timestamps without any
> knowledge of the underlying time source, why is it important?
>
> It seems we can safely change the time source in those generic tests to be
> 'system_unsync': local machine clock which is not necessarily synchronized
> by
> the kernel NTP discipline (see SystemUnsyncTime in
> $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).
>
> Once switched to 'system_unsync' time source for tests, in addition it will
> be necessary to add new dedicated tests scenarios such as:
>   * Smoke scenarios: using the system NTP time source for Kudu clusters.
>   * Smoke scenarios: using the built-in NTP time source for Kudu clusters.
>   * Advanced scenarios to detect issues due to irregularities in time source
>     (like time jumping back and forth, etc.).
>
> What do you think?  Your feedback is appreciated.
>
> Thank you!
>
>
>
> Kind regards,
>
> Alexey
>
> [1] http://gerrit.cloudera.org:8080/7477
> [2]
> https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a

Reply via email to