Hello !
My question is maybe mainly GCP-oriented, so I apologize if it is not
fully related to the Beam community.
We have a streaming pipeline running on Dataflow which writes data to a
PostgreSQL instance hosted on Cloud SQL. This database is suffering from
connection leak spikes on a regular basis:
The connections are kept alive until the pipeline is canceled/drained:
We are writing to the database with:
- individual DoFn where we open/close the connection using the standard
JDBC try/catch (SQLException ex)/finally statements;
- a Pipeline.apply(JdbcIO.<SessionData>write()) operations.
I observed that these spikes happens most of the time after I/O errors
with the database. Has anyone observed the same situation ?
I have several questions/observations, please correct me if I am wrong
(I am not from the java environment, so some can seem pretty trivial) :
- Does the apply method handles SQLException or I/O errors ?
- Would the use of a connection pool prevents such behaviours ? If so,
how would one implement it to allow all workers to use it ? Could it be
implemented with JDBC Connection pooling ?
I am worrying about the serialization if one would pass a Connection
item as an argument of a DoFn.
Thank you in advance for your comments and reactions.
Best regards,
Jonathan