[ 
https://issues.apache.org/jira/browse/BEAM-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Perron updated BEAM-6475:
----------------------------------
    Description: 
We have a streaming pipeline running on Dataflow which writes data to a 
PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
connection increases on a regular but unpredictable basis without particular 
reason.

Latest example was on Friday 18th January 2019 (see attached file).

(The spike in the middle is unrelated to this issue as it belongs to a periodic 
batch pipeline).

Investigations in the GCP logs provides following warning happening at the same 
time as the connection increases:

_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
SessionData have well defined equals method. This may produce incorrect results 
on some PipelineRunner_

This log line is present 13 times in a very short interval, between 
05:52:11:11.067 and 05:52:11:11.126.

The SessionData are custom objects which inherits java.io.Serializable. They 
are written to the PostgreSQL database using:

_pipeline.apply(JdbcIO.<SessionData>write()_
 
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
 options.db_login,options.db_password))_
 _.withStatement("SQL_STATEMENT")_
 _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_

 Looking at pg_stat_activity in the psql instance, all connections are used. 
Using _select * from pg_stat_activity where state = 'idle' and query = 
'ROLLBACK';_, no result is returned.

  was:
We have a streaming pipeline running on Dataflow which writes data to a 
PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
connection increases on a regular but unpredictable basis without particular 
reason.

Latest example was on Friday 18th January 2019:

Unable to render embedded object:

 !image-2019-01-21-09-38-43-284.png!


(The spike in the middle is unrelated to this issue as it belongs to a periodic 
batch pipeline).

Investigations in the GCP logs provides following warning happening at the same 
time as the connection increases:

_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
SessionData have well defined equals method. This may produce incorrect results 
on some PipelineRunner_

This log line is present 13 times in a very short interval, between 
05:52:11:11.067 and 05:52:11:11.126.

The SessionData are custom objects which inherits java.io.Serializable. They 
are written to the PostgreSQL database using:

_pipeline.apply(JdbcIO.<SessionData>write()_
 
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
 options.db_login,options.db_password))_
 _.withStatement("SQL_STATEMENT")_
 _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_

 Looking at pg_stat_activity in the psql instance, all connections are used. 
Using _select * from pg_stat_activity where state = 'idle' and query = 
'ROLLBACK';_, no result is returned.


> SQL Connection leak when using streaming pipeline
> -------------------------------------------------
>
>                 Key: BEAM-6475
>                 URL: https://issues.apache.org/jira/browse/BEAM-6475
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-jdbc
>    Affects Versions: 2.7.0
>            Reporter: Jonathan Perron
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>         Attachments: connections_2019_01_19.png
>
>
> We have a streaming pipeline running on Dataflow which writes data to a 
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
> connection increases on a regular but unpredictable basis without particular 
> reason.
> Latest example was on Friday 18th January 2019 (see attached file).
> (The spike in the middle is unrelated to this issue as it belongs to a 
> periodic batch pipeline).
> Investigations in the GCP logs provides following warning happening at the 
> same time as the connection increases:
> _2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
> SessionData have well defined equals method. This may produce incorrect 
> results on some PipelineRunner_
> This log line is present 13 times in a very short interval, between 
> 05:52:11:11.067 and 05:52:11:11.126.
> The SessionData are custom objects which inherits java.io.Serializable. They 
> are written to the PostgreSQL database using:
> _pipeline.apply(JdbcIO.<SessionData>write()_
>  
> _.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
>  options.db_login,options.db_password))_
>  _.withStatement("SQL_STATEMENT")_
>  _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
>  Looking at pg_stat_activity in the psql instance, all connections are used. 
> Using _select * from pg_stat_activity where state = 'idle' and query = 
> 'ROLLBACK';_, no result is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to