[ 
https://issues.apache.org/jira/browse/BEAM-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-2659:
--------------------------------
    Labels:   (was: stale-P2)

> JdbcIOIT flaky when run using io-it-suite-local
> -----------------------------------------------
>
>                 Key: BEAM-2659
>                 URL: https://issues.apache.org/jira/browse/BEAM-2659
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>            Reporter: Stephen Sisk
>            Priority: P3
>
> Note: the problem below *should not* affect io-it-suite and thus the jdbc 
> jenkins job that's currently in PR. I haven't tested that exact 
> configurations so I'm not 100% certain, but I don't have any indications that 
> there'll be a problem.
> ---
>  I've been running the postgres kubernetes scripts locally for a while and 
> haven't seen any problems.
> However, now that I'm running it via io-it-suite-local, and I'm starting to 
> see flakiness - clients attempting to connect to the postgres server will get 
> "connection attempt failed" error.  The difference between what was working 
> before and now is that now the load balancer and the pod are getting set up 
> at the same time. Before I was using it with a pre-existing load balancer - 
> that is I haven't been tearing down/starting up the load balancer+pod at run 
> time.
> So I think the problem is in the interaction between the two or potentially 
> just in the LoadBalancer service (it may take a little bit longer to get 
> fully hooked up even after it reports an IP)
> Possible causes:
> * the loadbalancer is reporting it's ready before it actually can serve 
> traffic to the postgres instance
> * the lodabalancer has another status field that I'm not looking at - today 
> we only check IP address, perhaps the loadbalancer exposes a status field? 
> kubectl get/describe might be able to help. A cursory examination didn't show 
> anything helpful.
> * the postgres instance isn't actually ready when it says it is. I don't 
> think that's the issue since I was working with postgres pods before and they 
> seemed fine then
> Potential solutions: 
> * if cause is slow postgres pod start (unlikely): determine postgres pod 
> health by reading from sql?  (pg_ctl?), and then have pkb wait for that by 
> adding a dynamic_pipeline_option that wait for the kubernetes status to be 
> okay and sends to a non-existent pipeline option
> * file bug about loadbalancer not being ready when it says it is? 
> (investigate that more :)
> * have some way for pkb to actually connect to and validate the connection to 
> postgres (that seems complicated.)
> If the problem is that the loadbalancer is not ready when it says it is, 
> while we are waiting for kubernetes to fix the issue, one workaround would be 
> to:
> 1) modify io-it-suite-local to not load any kubernetes scripts (set 
> --beam_kubernetes_scripts equal to blank line or skip it altogether - 
> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
> 2) have the user run the kubernetes scripts manually beforehand, wait for it 
> to be healthy, and then run io-it-suite-local 
> To repro the problem:
> mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc 
> -DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py" 
> -DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]' 
> -DforceDirectRunner=true
> this should fail when run repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to