[
https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sangjin Lee updated MAPREDUCE-6237:
-----------------------------------
Labels: 2.6.1-candidate (was: )
> Multiple mappers with DBInputFormat don't work because of reusing conections
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-6237
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Kannan Rajah
> Assignee: Kannan Rajah
> Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: mapreduce-6237.patch, mapreduce-6237.patch,
> mapreduce-6237.patch
>
>
> DBInputFormat.createDBRecorder is reusing JDBC connections across instances
> of DBRecordReader. This is not a good idea. We should be creating separate
> connection. If performance is a concern, then we should be using connection
> pooling instead.
> I looked at DBOutputFormat.getRecordReader. It actually creates a new
> Connection object for each DBRecordReader. So can we just change
> DBInputFormat to create new Connection every time? The connection reuse code
> was added as part of connection leak bug in MAPREDUCE-1443. Any reason for
> caching the connection?
> We observed this issue in a customer setup where they were reading data from
> MySQL using Pig. As per customer, the query is returning two records which
> causes Pig to create two instances of DBRecordReader. These two instances are
> sharing the database connection instance. The first DBRecordReader runs to
> extract the first record from MySQL just fine, but then closes the shared
> connection instance. When the second DBRecordReader runs, it tries to execute
> a query to retrieve the second record on the closed shared connection
> instance, which fail. If we set
> mapred.map.tasks to 1, the query will be successful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)