Jonathan Perron created BEAM-6475:
-------------------------------------
Summary: SQL Connection leak when using streaming pipeline
Key: BEAM-6475
URL: https://issues.apache.org/jira/browse/BEAM-6475
Project: Beam
Issue Type: Bug
Components: io-java-jdbc
Affects Versions: 2.7.0
Reporter: Jonathan Perron
Assignee: Jean-Baptiste Onofré
We have a streaming pipeline running on Dataflow which writes data to a
PostgreSQL instance hosted on Cloud SQL. This database is suffering from
connection increases on a regular but unpredictable basis without particular
reason.
Latest example was on Friday 18th January 2019:
!image-2019-01-21-08-50-13-932.png!
(The spike in the middle is unrelated to this issue as it belongs to a periodic
batch pipeline).
Investigations in the GCP logs provides following warning happening at the same
time as the connection increases:
_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type
SessionData have well defined equals method. This may produce incorrect results
on some PipelineRunner_
This log line is present 13 times in a very short interval, between
05:52:11:11.067 and 05:52:11:11.126.
The SessionData are custom objects which inherits java.io.Serializable. They
are written to the PostgreSQL database using:
_pipeline.apply(JdbcIO.<SessionData>write()_
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
options.db_login,options.db_password))_
_.withStatement("SQL_STATEMENT")_
_.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
Looking at pg_stat_activity in the psql instance, all connections are used.
Using _select * from pg_stat_activity where state = 'idle' and query =
'ROLLBACK';_, no result is returned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)