zentol commented on issue #10436: [FLINK-14920] [flink-end-to-end-perf-tests] 
Set up environment to run performance e2e tests
URL: https://github.com/apache/flink/pull/10436#issuecomment-592501122
 
 
   okay, here we go.
   
   Let me first walk you through some of the problems I saw in this PR:
   
   1) There is a weird inconsistency in the required SSH credentials; the 
library requires username/password, but to start the cluster we rely on 
`start-cluster.sh` which requires password-less SSH to be setup.
   This requires some further investigations/clarifications.
   
   2) The ssh utilities would be easier to use if there were a dedicated 
container for credentials.
   
   3) The used ssh library is really old (from 2006), and the latest release 
was in 2014; we may want to use another library.
   
   4) You've added a lot of additional functionality into the 
FlinkDistribution, which overloads it a bit, and it also breaks some API 
contracts.
   For example, in it's current form it should be possible to modify the 
configuration after `before()` was called, but these changes aren't applied to 
the remote hosts.
   
   5) The DistributedStandaloneFlinkResource kinda contains nothing, and may 
not work since it assumes at least one JM to be on the local host.
   
   ```
   
   ```
   
   As mentioned earlier, while working on this I noticed a few things that are 
awkward in the existing e2e code, and have filed a number of tickets and 
prepared solutions for them:
   
   With FLINK-16307 we now always rely on `start-cluster.sh` to start the 
cluster, which will be more consistent with the distributed setup.
   With FLINK-16190 tests no longer use the FlinkDistribution directly, which 
allows us to rework it into a simple wrapper.
   With FLINK-16189 the FlinkDistribution no longer contains any testing logic; 
parameters are passed in from the FlinkResource implementations instead. This 
makes it easier to re-use in ways that don't fit the junit lifecycle, like in 
our case creating a FlinkDistribution, mutating it and afterwards distributing 
it across several hosts.
   
   ```
   
   ```
   
   Based on the problems I saw in this PR and the filed tickets I have created 
a work-in-progress branch that shows how I envision the solution:
   https://github.com/zentol/flink/tree/14920_wip
   
   Let me outline some differences / decisions I made:
   
   The StandalondCluster-/-JobController are now a top-level class to simplify 
re-use by multiple FlinkResource implementations.
   
   All options specific to the distributed setup are now evaluated by the 
`DistributedFlinkResourceFactory`; this reduces the number of places where 
parameters are processed, which makes it easier to discover which options need 
to set.
   
   The `DistributedFlinkResource` takes care of the distributed setup. It 
mutates the local copy of the distribution (that is used to orchestrate the 
cluster startup / job submissions), distributes the distributions and has the 
logic for performing operations across multiple distributions.
   
   A remote distribution is represented by the `RemoteFlinkDistribution` class, 
a heavily stripped-down version of the `FlinkDistribution` that does not allow 
any modifications at this point. It is mostly used to copy/search logs and stop 
any processes running on this host (which may not even be required).
   
   The SSH utils are reworked into an `SshTool`, which acts both as a container 
for ssh configuration options (port, credentials) and access point to SSH 
method that can be passed around conveniently.
   
   I also switched to `com.hierynomus:sshj`, which appears to be better 
maintained and has fairly nice API.
   
   I left a number of TODO comments for things that are still missing.
   Please take a look and tell me what you think.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to