[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312745#comment-14312745
 ] 

Kannan Rajah commented on MAPREDUCE-6237:
-----------------------------------------

Created MAPREDUCE-6247 to track connection pooling.

> Multiple mappers with DBInputFormat don't work because of reusing conections
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6237
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>             Fix For: 2.6.1
>
>         Attachments: mapreduce-6237.patch, mapreduce-6237.patch, 
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances 
> of DBRecordReader. This is not a good idea. We should be creating separate 
> connection. If performance is a concern, then we should be using connection 
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new 
> Connection object for each DBRecordReader. So can we just change 
> DBInputFormat to create new Connection every time? The connection reuse code 
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for 
> caching the connection?
> We observed this issue in a customer setup where they were reading data from 
> MySQL using Pig. As per customer, the query is returning two records which 
> causes Pig to create two instances of DBRecordReader. These two instances are 
> sharing the database connection instance. The first DBRecordReader runs to 
> extract the first record from MySQL just fine, but then closes the shared 
> connection instance. When the second DBRecordReader runs, it tries to execute 
> a query to retrieve the second record on the closed shared connection 
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to