Hello, I have recently created a streaming google dataflow program with apache beam using the java SDK. When files land in cloud-storage they fire off pubsub messages with the filename, which I consume and then write to a cloud sql database. Everything works great for the most part. However I've been testing it more thoroughly recently and noticed that if I start reading in multiple files that database connections slowly grow and grow until they hit the default limit of 100 connections. Strangely the idle connections never seem to disappear and the program might run for hours watching for pubsub messages so this creates a problem.
My initial idea was to create a c3p0 connection pool and pass that in as the datasource through the JdbcIO.DataSourceConfiguration.create method. I noticed this didn't seem to make a difference which perplexed me even with my aggressive pool connections. After some debugging I noticed that the datasource was still being wrapped in a pooling datasource..even through it already is a pooled datasource. I was wondering what strangeness this caused, so locally I hacked JdbcIO to just return my c3p0 datasource and do nothing else in the buildDatasource method ( https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java - line 331). It seemed to alleviate the connection problems and now I see the idle connections slowly start disappearing in cloud sql. Everything appears to be working smoothly. Obviously this isn't the solution I want moving forward. Is there some other way to achieve this? What grave mistakes have I done by bypassing the standard way of doing it? ---- Sent using Guerrillamail.com Block or report abuse: https://www.guerrillamail.com//abuse/?a=RURiJQ8FQrlbiR6183YaPBvVSg%3D%3D