Dear all,

It happened again on Friday morning:

You can see a baseline in the connection amount from the 16th to the 18th.

Looking at the pg_stat_activity, all connections are used, even when the pipeline is not used at 100 % (my use case is processing data from a platform which is not used a lot during the week-end).

I opened a ticket on the project JIRA: https://issues.apache.org/jira/browse/BEAM-6475. I added as many information as I could gather, but let me know if anything else is required.

Best regards,

Jonathan

On 18/01/2019 06:17, Kenneth Knowles wrote:
My mistake - using @Teardown in this way is a good approach. It may not be executed sometimes, but like Reuven says it means the process died.

Kenn

On Thu, Jan 17, 2019 at 9:31 AM Jean-Baptiste Onofré <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:

    Hi,

    I don't think we have connection leak in normal behavior.

    The actual SQL statement is executed in @FinishBundle, where the
    connection is closed.

    The processElement adds record to process.

    Does it mean that an Exception occurs in the batch addition ?

    Regards
    JB

    On 17/01/2019 12:41, Alexey Romanenko wrote:
    > Kenn,
    >
    > I’m not sure that we have a connection leak in JdbcIO since new
    > connection is being obtained from an instance
    of /javax.sql.DataSource/
    > (created in @Setup) and which
    > is /org.apache.commons.dbcp2.BasicDataSource/ by default.
    > /BasicDataSource/ uses connection pool and closes all idle
    connections
    > in "close()”.
    >
    > In its turn, JdbcIO calls/DataSource.close()/ in @Teardown, so
    all idle
    > connections should be closed and released there in case of fails.
    > Though, potentially some connections, that has been delegated to
    client
    > before and were not not properly returned to pool, could be leaked…
    > Anyway, I think it could be a good idea to call
    "/connection.close()/”
    > (return to connection pool) explicitly in case of any exception
    happened
    > during bundle processing.
    >
    > Probably JB may provide more details as original author of JdbcIO.
    >
    >> On 14 Jan 2019, at 21:37, Kenneth Knowles <k...@apache.org
    <mailto:k...@apache.org>
    >> <mailto:k...@apache.org <mailto:k...@apache.org>>> wrote:
    >>
    >> Hi Jonathan,
    >>
    >> JdbcIO.write() just invokes this
    >> DoFn:
    
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L765
    >>
    >> It establishes a connection in @StartBundle and then in
    @FinishBundle
    >> it commits a batch and closes the connection. If an error happens
    >> in @StartBundle or @ProcessElement there will be a retry with a
    fresh
    >> instance of the DoFn, which will establish a new connection. It
    looks
    >> like neither @StartBundle nor @ProcessElement closes the
    connection,
    >> so I'm guessing that the old connection sticks around because the
    >> worker process was not terminated. So the Beam runner and Dataflow
    >> service are working as intended and this is an issue with JdbcIO,
    >> unless I've made a mistake in my reading or analysis.
    >>
    >> Would you mind reporting these details
    >> to https://issues.apache.org/jira/projects/BEAM/ ?
    >>
    >> Kenn
    >>
    >> On Mon, Jan 14, 2019 at 12:51 AM Jonathan Perron
    >> <jonathan.per...@lumapps.com
    <mailto:jonathan.per...@lumapps.com>
    <mailto:jonathan.per...@lumapps.com
    <mailto:jonathan.per...@lumapps.com>>> wrote:
    >>
    >>     Hello !
    >>
    >>     My question is maybe mainly GCP-oriented, so I apologize if
    it is
    >>     not fully related to the Beam community.
    >>
    >>     We have a streaming pipeline running on Dataflow which
    writes data
    >>     to a PostgreSQL instance hosted on Cloud SQL. This database is
    >>     suffering from connection leak spikes on a regular basis:
    >>
    >>     <ofbkcnmdfbgcoooc.png>
    >>
    >>     The connections are kept alive until the pipeline is
    canceled/drained:
    >>
    >>     <gngklddbhnckgpni.png>
    >>
    >>     We are writing to the database with:
    >>
    >>     - individual DoFn where we open/close the connection using the
    >>     standard JDBC try/catch (SQLException ex)/finally statements;
    >>
    >>     - a Pipeline.apply(JdbcIO.<SessionData>write()) operations.
    >>
    >>     I observed that these spikes happens most of the time after I/O
    >>     errors with the database. Has anyone observed the same
    situation ?
    >>
    >>     I have several questions/observations, please correct me if
    I am
    >>     wrong (I am not from the java environment, so some can seem
    pretty
    >>     trivial) :
    >>
    >>     - Does the apply method handles SQLException or I/O errors ?
    >>
    >>     - Would the use of a connection pool prevents such
    behaviours ? If
    >>     so, how would one implement it to allow all workers to use it ?
    >>     Could it be implemented with JDBC Connection pooling ?
    >>
    >>     I am worrying about the serialization if one would pass a
    >>     Connection item as an argument of a DoFn.
    >>
    >>     Thank you in advance for your comments and reactions.
    >>
    >>     Best regards,
    >>
    >>     Jonathan
    >>
    >

-- Jean-Baptiste Onofré
    jbono...@apache.org <mailto:jbono...@apache.org>
    http://blog.nanthrax.net
    Talend - http://www.talend.com

Reply via email to