[
https://issues.apache.org/jira/browse/BEAM-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Perron updated BEAM-6475:
----------------------------------
Description:
We have a streaming pipeline running on Dataflow which writes data to a
PostgreSQL instance hosted on Cloud SQL. This database is suffering from
connection increases on a regular but unpredictable basis without particular
reason.
Latest example was on Friday 18th January 2019 (see attached file).
(The spike in the middle is unrelated to this issue as it belongs to a periodic
batch pipeline).
Investigations in the GCP logs provides following warning happening at the same
time as the connection increases:
_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type
SessionData have well defined equals method. This may produce incorrect results
on some PipelineRunner_
This log line is present 13 times in a very short interval, between
05:52:11:11.067 and 05:52:11:11.126.
The SessionData are custom objects which inherits java.io.Serializable. They
are written to the PostgreSQL database using:
_pipeline.apply(JdbcIO.<SessionData>write()_
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
options.db_login,options.db_password))_
_.withStatement("SQL_STATEMENT")_
_.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
Looking at pg_stat_activity in the psql instance, all connections are used.
Using _select * from pg_stat_activity where state = 'idle' and query =
'ROLLBACK';_, no result is returned.
was:
We have a streaming pipeline running on Dataflow which writes data to a
PostgreSQL instance hosted on Cloud SQL. This database is suffering from
connection increases on a regular but unpredictable basis without particular
reason.
Latest example was on Friday 18th January 2019:
Unable to render embedded object:
!image-2019-01-21-09-38-43-284.png!
(The spike in the middle is unrelated to this issue as it belongs to a periodic
batch pipeline).
Investigations in the GCP logs provides following warning happening at the same
time as the connection increases:
_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type
SessionData have well defined equals method. This may produce incorrect results
on some PipelineRunner_
This log line is present 13 times in a very short interval, between
05:52:11:11.067 and 05:52:11:11.126.
The SessionData are custom objects which inherits java.io.Serializable. They
are written to the PostgreSQL database using:
_pipeline.apply(JdbcIO.<SessionData>write()_
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
options.db_login,options.db_password))_
_.withStatement("SQL_STATEMENT")_
_.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
Looking at pg_stat_activity in the psql instance, all connections are used.
Using _select * from pg_stat_activity where state = 'idle' and query =
'ROLLBACK';_, no result is returned.
> SQL Connection leak when using streaming pipeline
> -------------------------------------------------
>
> Key: BEAM-6475
> URL: https://issues.apache.org/jira/browse/BEAM-6475
> Project: Beam
> Issue Type: Bug
> Components: io-java-jdbc
> Affects Versions: 2.7.0
> Reporter: Jonathan Perron
> Assignee: Jean-Baptiste Onofré
> Priority: Major
> Attachments: connections_2019_01_19.png
>
>
> We have a streaming pipeline running on Dataflow which writes data to a
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from
> connection increases on a regular but unpredictable basis without particular
> reason.
> Latest example was on Friday 18th January 2019 (see attached file).
> (The spike in the middle is unrelated to this issue as it belongs to a
> periodic batch pipeline).
> Investigations in the GCP logs provides following warning happening at the
> same time as the connection increases:
> _2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type
> SessionData have well defined equals method. This may produce incorrect
> results on some PipelineRunner_
> This log line is present 13 times in a very short interval, between
> 05:52:11:11.067 and 05:52:11:11.126.
> The SessionData are custom objects which inherits java.io.Serializable. They
> are written to the PostgreSQL database using:
> _pipeline.apply(JdbcIO.<SessionData>write()_
>
> _.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
> options.db_login,options.db_password))_
> _.withStatement("SQL_STATEMENT")_
> _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
> Looking at pg_stat_activity in the psql instance, all connections are used.
> Using _select * from pg_stat_activity where state = 'idle' and query =
> 'ROLLBACK';_, no result is returned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)