Incorrect DBInputFormat transaction context
-------------------------------------------
Key: HADOOP-5960
URL: https://issues.apache.org/jira/browse/HADOOP-5960
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Affects Versions: 0.20.0, 0.19.1, 0.19.0
Environment: Mac OSX 10.5.6, IntelliJ 7.0.5
Reporter: Yuchen
In my Map/Reduce job, I use DBInputFormat to get the original tasks for its
convenience. I also need to update my mysql db occasionally in our reducer.
Because I need to update mysql db, instead of "insert", I cannot use
DBOutputFormat. So I use my own JDBC call. I make my own connection like this:
Class.forName("com.mysql.jdbc.Driver").newInstance();
conn = DriverManager.getConnection(jdbcUrl);
However, everytime when I try to do the update, I got an SQL exception
"transaction lock time out; try restarting transction" -- even though I didn't
use transaction at all in my update (setAutoCommit to false).
Digging into the hadoop code, I found in DBInputFormat, there are these lines:
this.connection.setAutoCommit(false);
connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
When I comment them out (and the connection.commit()) and everything works
fine. I also found the connection in DBInputFormat is never closed. I am
wondering why we need to set the transaction / transaction isolation since we
are in DBInputFormat? and why I can't overwrite it in my jdbc call even if
explicitly set autocommit to false and transaction isolation type to default
(repeat-read).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.