[
https://issues.apache.org/jira/browse/FLINK-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler reassigned FLINK-11463:
----------------------------------------
Assignee: (was: Chesnay Schepler)
> Rework end-to-end tests in Java
> -------------------------------
>
> Key: FLINK-11463
> URL: https://issues.apache.org/jira/browse/FLINK-11463
> Project: Flink
> Issue Type: New Feature
> Components: Test Infrastructure
> Reporter: Chesnay Schepler
> Priority: Major
>
> This is the (long-term) umbrella issue for reworking our end-to-tests in Java
> on top of a new set of utilities.
> Below are some areas where problems have been identified that I want to
> address with a prototype soon. This prototype primarily aims to introduce
> certain patterns to be built upon in the future.
> h2. Environments
> h4. Problem
> Our current tests directly work against flink-dist and setup local clusters
> with/-out HA. Similar issues apply to Kafka and ElasticSearch.
> This prevents us from re-using tests for other environments (Yarn, Docker)
> and distributed settings.
> We also frequently have issues with cleaning up resources as it is the
> responsibility of the test itself.
> h4. Proposal
> Introduce a common interface for a given resource type (i.e. Flink, Kafka)
> that tests will work against.
> These resources should be implemented as jUnit external resources to allow
> reasonable life-cycle management.
> Tests get access to an instance of this resource through a factory method.
> Each resource implementation has a dedicated factory that is loaded with a
> {{ServiceLoader}}. Factories evaluate system-properties to determine whether
> the implementation should be loaded, and then optionally configure the
> resource.
> Example:
> {code}
> public interface FlinkResource {
> ... common methods ...
> /**
> * Returns the configured FlinkResource implementation, or a {@link
> LocalStandaloneFlinkResource} if none is configured.
> *
> * @return configured FlinkResource, or {@link
> LocalStandaloneFlinkResource} is none is configured
> */
> FlinkResource get() {
> // load factories
> // evaluate system properties
> // return instance
> }
> }
> public interface FlinkResourceFactory {
> /**
> * Returns a {@link FlinkResource} instance. If the instance could not
> be instantiated (for example, because a
> * mandatory parameter was missing), then an empty {@link Optional}
> should be returned.
> *
> * @return FlinkResource instance, or an empty Optional if the instance
> could not be instantiated
> */
> Optional<FlinkResource> create();
> }
> {code}
> As example, running {{mvn verify -De2e.flink.mode=localStandalone}} could
> load a FlinkResource that sets up a local standalone cluster, while for {{mvn
> verify -De2e.flink.mode=distributedStandalone -De2e.flink.hosts=...}} it
> would connect to the given host and setup a distributed cluster.
> Tests are not _required_ to work against the common interface, and may be
> hard-wired to run against specific implementations. Simply put, the resource
> implementations should be public.
> h4. Future considerations
> The factory method may be extended to allow tests to specify a set of
> conditions that must be fulfilled, for example HA to be enabled. If this
> requirement cannot be fulfilled the test should be skipped.
> h2. Split Management
> h4. Problem
> End-to-end tests are run in separate {{cron-<version>-e2e}} branches. To
> accommodate the Travis time limits we run a total of 6 jobs each covering a
> subset of the tests.
> These so-called splits are currently managed in the respective branches, and
> not on master/release branches.
> This is a rather hidden detail that not everyone is aware of, nor is it
> easily discoverable. This has resulted several times in newly added tests not
> actually being run. Furthermore, if the arguments for tests are modified
> these changes have to be replicated to each branch.
> h4. Proposal
> Use jUnit Categories to assign each test explicitly to one of the Travis jobs.
> {code}
> @Category(TravisGroup1.class)
> public class MyTestRunningInTheFirstJob {
> ...
> }
> {code}
> It's a bit on the nose but a rather simple solution.
> A given group of tests could be executed by running {{mvn verify
> -Dcategories="org.apache.flink.tests.util.TravisGroup1"}}.
> All tests can be executed by running {{mvn verify
> -Dcategories=""org.apache.flink.tests.util.TravisGroup1""}}
> h4. Future considerations
> Tests may furthermore be categorized based on what they are testing (e.g.
> "Metrics", "Checkpointing", "Kafka") to allow running a certain subset of
> tests quickly.
> h2. Caching of downloaded artifacts
> h4. Problem
> Several tests download archives for setting up systems, like Kafka of
> Elasticsearch. We currently do not cache downloads in any way, resulting in
> less stable tests (as mirrors aren't always available) and overall increased
> test duration (since the downloads at times are quite slow). The duration
> issue becomes especially apparent when running tests in a loop for debugging
> or release-testing purposes.
> Finally, it also puts unnecessary strain on the download mirrors.
> h4. Proposal
> Add a {{DownloadCache}} interface with a single {{Path getOrDownload(String
> url, Path targetDir)}} method.
> Access to and loading of implementations are handled like resources (see
> above).
> The caching behavior is implementation-dependent.
> A reasonable implementation should allow files may be cached in a
> user-provided directory, with an optional time-to-live for long-term setups.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)