Stephen Sisk created BEAM-2659:
----------------------------------
Summary: JdbcIOIT flaky when run using io-it-suite-local
Key: BEAM-2659
URL: https://issues.apache.org/jira/browse/BEAM-2659
Project: Beam
Issue Type: Bug
Components: testing
Reporter: Stephen Sisk
Assignee: Chamikara Jayalath
Note: the problem below *should not* affect io-it-suite and thus the jdbc
jenkins job that's currently in PR. I haven't tested that exact configurations
so I'm not 100% certain, but I don't have any indications that there'll be a
problem.
---
I've been running the postgres kubernetes scripts locally for a while and
haven't seen any problems.
However, now that I'm running it via io-it-suite-local, and I'm starting to see
flakiness - clients attempting to connect to the postgres server will get
"connection attempt failed" error. The difference between what was working
before and now is that now the load balancer and the pod are getting set up at
the same time. Before I was using it with a pre-existing load balancer - that
is I haven't been tearing down/starting up the load balancer+pod at run time.
So I think the problem is in the interaction between the two or potentially
just in the LoadBalancer service (it may take a little bit longer to get fully
hooked up even after it reports an IP)
Possible causes:
* the loadbalancer is reporting it's ready before it actually can serve traffic
to the postgres instance
* the lodabalancer has another status field that I'm not looking at - today we
only check IP address, perhaps the loadbalancer exposes a status field? kubectl
get/describe might be able to help. A cursory examination didn't show anything
helpful.
* the postgres instance isn't actually ready when it says it is. I don't think
that's the issue since I was working with postgres pods before and they seemed
fine then
Potential solutions:
* if cause is slow postgres pod start (unlikely): determine postgres pod health
by reading from sql? (pg_ctl?), and then have pkb wait for that by adding a
dynamic_pipeline_option that wait for the kubernetes status to be okay and
sends to a non-existent pipeline option
* file bug about loadbalancer not being ready when it says it is? (investigate
that more :)
* have some way for pkb to actually connect to and validate the connection to
postgres (that seems complicated.)
If the problem is that the loadbalancer is not ready when it says it is, while
we are waiting for kubernetes to fix the issue, one workaround would be to:
1) modify io-it-suite-local to not load any kubernetes scripts (set
--beam_kubernetes_scripts equal to blank line or skip it altogether -
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
2) have the user run the kubernetes scripts manually beforehand, wait for it to
be healthy, and then run io-it-suite-local
To repro the problem:
mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc
-DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py"
-DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]'
-DforceDirectRunner=true
this should fail when run repeatedly.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)