[
https://issues.apache.org/jira/browse/BEAM-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-2659:
--------------------------------
Labels: (was: stale-P2)
> JdbcIOIT flaky when run using io-it-suite-local
> -----------------------------------------------
>
> Key: BEAM-2659
> URL: https://issues.apache.org/jira/browse/BEAM-2659
> Project: Beam
> Issue Type: Bug
> Components: testing
> Reporter: Stephen Sisk
> Priority: P3
>
> Note: the problem below *should not* affect io-it-suite and thus the jdbc
> jenkins job that's currently in PR. I haven't tested that exact
> configurations so I'm not 100% certain, but I don't have any indications that
> there'll be a problem.
> ---
> I've been running the postgres kubernetes scripts locally for a while and
> haven't seen any problems.
> However, now that I'm running it via io-it-suite-local, and I'm starting to
> see flakiness - clients attempting to connect to the postgres server will get
> "connection attempt failed" error. The difference between what was working
> before and now is that now the load balancer and the pod are getting set up
> at the same time. Before I was using it with a pre-existing load balancer -
> that is I haven't been tearing down/starting up the load balancer+pod at run
> time.
> So I think the problem is in the interaction between the two or potentially
> just in the LoadBalancer service (it may take a little bit longer to get
> fully hooked up even after it reports an IP)
> Possible causes:
> * the loadbalancer is reporting it's ready before it actually can serve
> traffic to the postgres instance
> * the lodabalancer has another status field that I'm not looking at - today
> we only check IP address, perhaps the loadbalancer exposes a status field?
> kubectl get/describe might be able to help. A cursory examination didn't show
> anything helpful.
> * the postgres instance isn't actually ready when it says it is. I don't
> think that's the issue since I was working with postgres pods before and they
> seemed fine then
> Potential solutions:
> * if cause is slow postgres pod start (unlikely): determine postgres pod
> health by reading from sql? (pg_ctl?), and then have pkb wait for that by
> adding a dynamic_pipeline_option that wait for the kubernetes status to be
> okay and sends to a non-existent pipeline option
> * file bug about loadbalancer not being ready when it says it is?
> (investigate that more :)
> * have some way for pkb to actually connect to and validate the connection to
> postgres (that seems complicated.)
> If the problem is that the loadbalancer is not ready when it says it is,
> while we are waiting for kubernetes to fix the issue, one workaround would be
> to:
> 1) modify io-it-suite-local to not load any kubernetes scripts (set
> --beam_kubernetes_scripts equal to blank line or skip it altogether -
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
> 2) have the user run the kubernetes scripts manually beforehand, wait for it
> to be healthy, and then run io-it-suite-local
> To repro the problem:
> mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc
> -DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py"
> -DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]'
> -DforceDirectRunner=true
> this should fail when run repeatedly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)