[ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=78957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-78957
 ]

ASF GitHub Bot logged work on BEAM-3500:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Mar/18 18:00
            Start Date: 09/Mar/18 18:00
    Worklog Time Spent: 10m 
      Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173522401
 
 

 ##########
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##########
 @@ -297,6 +364,39 @@ public DataSourceConfiguration withConnectionProperties(
       return builder().setConnectionProperties(connectionProperties).build();
     }
 
+    /**
+     * Allows to tweak the {@link PoolingDataSource} created internally by 
{@link JdbcIO}.
+     */
+    public DataSourceConfiguration withPoolConfiguration(int maxTotal,
 
 Review comment:
   Yeah I understand the intention, but these parameters are intended to be 
tuned according to the user's particular usage pattern, and here the usage 
pattern is almost completely outside the user's control - it's hidden behind:
   
   * implementation details of JdbcIO which are allowed to change at any moment 
without notice (and in fact *are* changing - this PR being an example)
   * implementation details of the particular runner, which also can do 
anything it wants as long as it complies with the Beam model; different runners 
do different things at different times with different data
   
   So there does not exist any goal that can be reliably accomplished by 
changing these parameters, because any user's assumptions of "what will happen 
if I change this parameter to value X" are going to be wrong.
   
   This is covered in 
https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose
 - I'm happy to expand that section with the argument above, if you think 
that'd be useful.
   
   validationQuery is not required either. AFAIK since JDBC4 (introduced in 
2006, with Java 6), JDBC connections can validate themselves in a 
driver-specific fashion via connection.isValid(), and dbcp takes advantage of 
that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 78957)
    Time Spent: 2h 50m  (was: 2h 40m)

> JdbcIO: Improve connection management
> -------------------------------------
>
>                 Key: BEAM-3500
>                 URL: https://issues.apache.org/jira/browse/BEAM-3500
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-jdbc
>    Affects Versions: 2.2.0
>            Reporter: Pawel Bartoszek
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>      DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to