kennknowles opened a new issue, #18432:
URL: https://github.com/apache/beam/issues/18432
Note: the problem below *should not* affect io-it-suite and thus the jdbc
jenkins job that's currently in PR. I haven't tested that exact configurations
so I'm not 100% certain, but I don't have any indications that there'll be a
problem.
\---
I've been running the postgres kubernetes scripts locally for a while and
haven't seen any problems.
However, now that I'm running it via io-it-suite-local, and I'm starting to
see flakiness - clients attempting to connect to the postgres server will get
"connection attempt failed" error. The difference between what was working
before and now is that now the load balancer and the pod are getting set up at
the same time. Before I was using it with a pre-existing load balancer - that
is I haven't been tearing down/starting up the load balancer****pod at run time.
So I think the problem is in the interaction between the two or potentially
just in the LoadBalancer service (it may take a little bit longer to get fully
hooked up even after it reports an IP)
Possible causes:
* the loadbalancer is reporting it's ready before it actually can serve
traffic to the postgres instance
* the lodabalancer has another status field that I'm not looking at - today
we only check IP address, perhaps the loadbalancer exposes a status field?
kubectl get/describe might be able to help. A cursory examination didn't show
anything helpful.
* the postgres instance isn't actually ready when it says it is. I don't
think that's the issue since I was working with postgres pods before and they
seemed fine then
Potential solutions:
* if cause is slow postgres pod start (unlikely): determine postgres pod
health by reading from sql? (pg_ctl?), and then have pkb wait for that by
adding a dynamic_pipeline_option that wait for the kubernetes status to be okay
and sends to a non-existent pipeline option
* file bug about loadbalancer not being ready when it says it is?
(investigate that more :)
* have some way for pkb to actually connect to and validate the connection
to postgres (that seems complicated.)
If the problem is that the loadbalancer is not ready when it says it is,
while we are waiting for kubernetes to fix the issue, one workaround would be
to:
1) modify io-it-suite-local to not load any kubernetes scripts (set
\--beam_kubernetes_scripts equal to blank line or skip it altogether -
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
2) have the user run the kubernetes scripts manually beforehand, wait for it
to be healthy, and then run io-it-suite-local
To repro the problem:
mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc
-DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py"
-DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]("--tempRoot=gs://sisk-test/staging")'
-DforceDirectRunner=true
this should fail when run repeatedly.
Imported from Jira
[BEAM-2659](https://issues.apache.org/jira/browse/BEAM-2659). Original Jira may
contain additional context.
Reported by: sisk.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]