Dear all,
It happened again on Friday morning:
You can see a baseline in the connection amount from the 16th to the 18th.
Looking at the pg_stat_activity, all connections are used, even when the
pipeline is not used at 100 % (my use case is processing data from a
platform which is not used a lot during the week-end).
I opened a ticket on the project JIRA:
https://issues.apache.org/jira/browse/BEAM-6475. I added as many
information as I could gather, but let me know if anything else is required.
Best regards,
Jonathan
On 18/01/2019 06:17, Kenneth Knowles wrote:
My mistake - using @Teardown in this way is a good approach. It may
not be executed sometimes, but like Reuven says it means the process
died.
Kenn
On Thu, Jan 17, 2019 at 9:31 AM Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Hi,
I don't think we have connection leak in normal behavior.
The actual SQL statement is executed in @FinishBundle, where the
connection is closed.
The processElement adds record to process.
Does it mean that an Exception occurs in the batch addition ?
Regards
JB
On 17/01/2019 12:41, Alexey Romanenko wrote:
> Kenn,
>
> I’m not sure that we have a connection leak in JdbcIO since new
> connection is being obtained from an instance
of /javax.sql.DataSource/
> (created in @Setup) and which
> is /org.apache.commons.dbcp2.BasicDataSource/ by default.
> /BasicDataSource/ uses connection pool and closes all idle
connections
> in "close()”.
>
> In its turn, JdbcIO calls/DataSource.close()/ in @Teardown, so
all idle
> connections should be closed and released there in case of fails.
> Though, potentially some connections, that has been delegated to
client
> before and were not not properly returned to pool, could be leaked…
> Anyway, I think it could be a good idea to call
"/connection.close()/”
> (return to connection pool) explicitly in case of any exception
happened
> during bundle processing.
>
> Probably JB may provide more details as original author of JdbcIO.
>
>> On 14 Jan 2019, at 21:37, Kenneth Knowles <k...@apache.org
<mailto:k...@apache.org>
>> <mailto:k...@apache.org <mailto:k...@apache.org>>> wrote:
>>
>> Hi Jonathan,
>>
>> JdbcIO.write() just invokes this
>> DoFn:
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
>>
>> It establishes a connection in @StartBundle and then in
@FinishBundle
>> it commits a batch and closes the connection. If an error happens
>> in @StartBundle or @ProcessElement there will be a retry with a
fresh
>> instance of the DoFn, which will establish a new connection. It
looks
>> like neither @StartBundle nor @ProcessElement closes the
connection,
>> so I'm guessing that the old connection sticks around because the
>> worker process was not terminated. So the Beam runner and Dataflow
>> service are working as intended and this is an issue with JdbcIO,
>> unless I've made a mistake in my reading or analysis.
>>
>> Would you mind reporting these details
>> to https://issues.apache.org/jira/projects/BEAM/ ?
>>
>> Kenn
>>
>> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron
>> <jonathan.per...@lumapps.com
<mailto:jonathan.per...@lumapps.com>
<mailto:jonathan.per...@lumapps.com
<mailto:jonathan.per...@lumapps.com>>> wrote:
>>
>> Hello !
>>
>> My question is maybe mainly GCP-oriented, so I apologize if
it is
>> not fully related to the Beam community.
>>
>> We have a streaming pipeline running on Dataflow which
writes data
>> to a PostgreSQL instance hosted on Cloud SQL. This database is
>> suffering from connection leak spikes on a regular basis:
>>
>> <ofbkcnmdfbgcoooc.png>
>>
>> The connections are kept alive until the pipeline is
canceled/drained:
>>
>> <gngklddbhnckgpni.png>
>>
>> We are writing to the database with:
>>
>> - individual DoFn where we open/close the connection using the
>> standard JDBC try/catch (SQLException ex)/finally statements;
>>
>> - a Pipeline.apply(JdbcIO.<SessionData>write()) operations.
>>
>> I observed that these spikes happens most of the time after I/O
>> errors with the database. Has anyone observed the same
situation ?
>>
>> I have several questions/observations, please correct me if
I am
>> wrong (I am not from the java environment, so some can seem
pretty
>> trivial) :
>>
>> - Does the apply method handles SQLException or I/O errors ?
>>
>> - Would the use of a connection pool prevents such
behaviours ? If
>> so, how would one implement it to allow all workers to use it ?
>> Could it be implemented with JDBC Connection pooling ?
>>
>> I am worrying about the serialization if one would pass a
>> Connection item as an argument of a DoFn.
>>
>> Thank you in advance for your comments and reactions.
>>
>> Best regards,
>>
>> Jonathan
>>
>
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com