[ 
https://issues.apache.org/jira/browse/BEAM-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Perron updated BEAM-6475:
----------------------------------
    Description: 
We have a streaming pipeline running on Dataflow which writes data to a 
PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
connection increases on a regular but unpredictable basis without particular 
reason.

Latest example was on Friday 18th January 2019:

Unable to render embedded object:

 !image-2019-01-21-09-38-43-284.png!


(The spike in the middle is unrelated to this issue as it belongs to a periodic 
batch pipeline).

Investigations in the GCP logs provides following warning happening at the same 
time as the connection increases:

_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
SessionData have well defined equals method. This may produce incorrect results 
on some PipelineRunner_

This log line is present 13 times in a very short interval, between 
05:52:11:11.067 and 05:52:11:11.126.

The SessionData are custom objects which inherits java.io.Serializable. They 
are written to the PostgreSQL database using:

_pipeline.apply(JdbcIO.<SessionData>write()_
 
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
 options.db_login,options.db_password))_
 _.withStatement("SQL_STATEMENT")_
 _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_

 Looking at pg_stat_activity in the psql instance, all connections are used. 
Using _select * from pg_stat_activity where state = 'idle' and query = 
'ROLLBACK';_, no result is returned.

  was:
We have a streaming pipeline running on Dataflow which writes data to a 
PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
connection increases on a regular but unpredictable basis without particular 
reason.

Latest example was on Friday 18th January 2019:

!image-2019-01-21-08-50-13-932.png!

(The spike in the middle is unrelated to this issue as it belongs to a periodic 
batch pipeline).

Investigations in the GCP logs provides following warning happening at the same 
time as the connection increases:

_2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
SessionData have well defined equals method. This may produce incorrect results 
on some PipelineRunner_

This log line is present 13 times in a very short interval, between 
05:52:11:11.067 and 05:52:11:11.126.

The SessionData are custom objects which inherits java.io.Serializable. They 
are written to the PostgreSQL database using:

 _pipeline.apply(JdbcIO.<SessionData>write()_
 
_.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
 options.db_login,options.db_password))_
 _.withStatement("SQL_STATEMENT")_
 _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_

 Looking at pg_stat_activity in the psql instance, all connections are used. 
Using _select * from pg_stat_activity where state = 'idle' and query = 
'ROLLBACK';_, no result is returned.


> SQL Connection leak when using streaming pipeline
> -------------------------------------------------
>
>                 Key: BEAM-6475
>                 URL: https://issues.apache.org/jira/browse/BEAM-6475
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-jdbc
>    Affects Versions: 2.7.0
>            Reporter: Jonathan Perron
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>         Attachments: connections_2019_01_19.png
>
>
> We have a streaming pipeline running on Dataflow which writes data to a 
> PostgreSQL instance hosted on Cloud SQL. This database is suffering from 
> connection increases on a regular but unpredictable basis without particular 
> reason.
> Latest example was on Friday 18th January 2019:
> Unable to render embedded object:
>  !image-2019-01-21-09-38-43-284.png!
> (The spike in the middle is unrelated to this issue as it belongs to a 
> periodic batch pipeline).
> Investigations in the GCP logs provides following warning happening at the 
> same time as the connection increases:
> _2019-01-18 05:52:11.067 HNEC Can't verify serialized elements of type 
> SessionData have well defined equals method. This may produce incorrect 
> results on some PipelineRunner_
> This log line is present 13 times in a very short interval, between 
> 05:52:11:11.067 and 05:52:11:11.126.
> The SessionData are custom objects which inherits java.io.Serializable. They 
> are written to the PostgreSQL database using:
> _pipeline.apply(JdbcIO.<SessionData>write()_
>  
> _.withDataSourceConfiguration(ExtractFunctions.getDataSourceConfiguration(options.instance,
>  options.db_login,options.db_password))_
>  _.withStatement("SQL_STATEMENT")_
>  _.withPreparedStatementSetter(new InsertSessionPrepareStatementSetter()));_
>  Looking at pg_stat_activity in the psql instance, all connections are used. 
> Using _select * from pg_stat_activity where state = 'idle' and query = 
> 'ROLLBACK';_, no result is returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to