Hello, I have recently created a streaming google dataflow program with apache 
beam using the java SDK. When files land in cloud-storage they fire off pubsub 
messages with the filename, which I consume and then write to a cloud sql 
database. Everything works great for the most part. However I've been testing 
it more thoroughly recently and noticed that if I start reading in multiple 
files that database connections slowly grow  and grow until they hit the 
default limit of 100 connections. Strangely the idle connections never seem to 
disappear and the program might run for hours watching for pubsub messages so 
this creates a problem. 

My initial idea was to create a c3p0 connection pool and pass that in as the 
datasource through the JdbcIO.DataSourceConfiguration.create method. I noticed 
this didn't seem to make a difference which perplexed me even with my 
aggressive pool connections. After some debugging I noticed that the datasource 
was still being wrapped in a pooling datasource..even through it already is a 
pooled datasource. I was wondering what strangeness this caused, so locally I 
hacked JdbcIO to just return my c3p0 datasource and do nothing else in the 
buildDatasource method ( 
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 - line 331). It seemed to alleviate the connection problems and now I see the 
idle connections slowly start disappearing in cloud sql. Everything appears to 
be working smoothly. Obviously this isn't the solution I want moving forward. 
Is there some other way to achieve this? What grave mistakes have I done by 
bypassing the standard way of doing it?





----
Sent using Guerrillamail.com
Block or report abuse: 
https://www.guerrillamail.com//abuse/?a=RURiJQ8FQrlbiR6183YaPBvVSg%3D%3D


Reply via email to