I think I know what's going on. I believe the MySQL process is not running under DMTCP. If that's correct, I'll just repeat what I wrote in a previous thread.
Basically, the problem is that there's an "external" socket that DMTCP is trying to drain at checkpoint time. The external socket here is the socket connection between the Java process and the MySQL process. (More background below.) There are two alternatives I can think off the top of my head: 1) Identify this external socket, and ask DMTCP to "blacklist" it to avoid draining it at checkpoint time. For example, see the method `isBlacklistedTcp()` in socketconnection.cpp. (https://github.com/dmtcp/dmtcp/blob/2.5/src/plugin/ipc/socket/socketconnection.cpp#L220) 2) Run the MySQL process under DMTCP. We have tried running the entire LAMP stack under DMTCP in the past, and it works. Background: For checkpointing, DMTCP classifies a socket in to two categories -- external and internal. An internal socket is when the two end points of the socket are running under DMTCP. In this case, DMTCP will, at checkpoint time, first quiesce the two processes, and then capture the in-flight data by "draining" the socket from both ends. On restart, DMTCP will restore the socket and put the captured data back on the network. I believe the error you see at checkpoint time is an artifact of the draining done by DMTCP. (For details about how this is done, see sections 4.3 and 4.4 in http://dmtcp.sourceforge.net/papers/dmtcp.pdf). The difficult case is when only one end, for example, the client in a client-server application, is running under DMTCP. In this case, the socket that the client uses to talk to the server needs to be marked as "external", implying that DMTCP will not try to drain the socket at checkpoint time. (There is a heuristic we use to detect an external socket but that does not always work.) On restart, the socket is presented as a dead socket to the client, and it's the client's responsibility to either ignore this dead socket if it's unimportant, or recover by creating a new socket if it's important. If you go with the first alternative I mentioned above, one possibility is to re-initiate the socket (JDBC) connection on restart. Since we don't have a Java API yet (we do have a Python API), it's a little tricky but doable. Could you try to catch the connection exception and re-initiate the JDBC connection? I believe it's mostly stateless so restoring the state shouldn't be a concern. Another possibility is to write a DMTCP plugin that knows how to checkpoint and restore Java's connections to MySQL. This is a more modular approach in that you don't have to worry about modifying the core application code and maintaining it as it evolves. On Thu, Apr 28, 2016 at 01:43:31PM +0530, Pratyush Patel wrote: > Hello Rohan, > > Thank you for the reply. You were right, there was indeed a command > (which created a process) being run by our Java program while the > checkpointing took place. I was able to get it working by running the > external process as a bash script and calling it in Java instead. > > I have one further question. While using DMTCP to checkpoint a simple > JDBC application (example: > http://www.vogella.com/tutorials/MySQLJava/article.html#javaconnection), > after opening the database connection, DMTCP completes the > checkpointing, but the program then results in an error as follows: > > $ dmtcp_launch java MyProgram > dmtcp_coordinator starting... > Host: PC (127.0.1.1) > Port: 7779 > Checkpoint Interval: disabled (checkpoint manually instead) > Exit on last client: 1 > Backgrounding... > /* Program Output */ > /* Program Output */ > /* Checkpoint command issued, checkpoint complete */ > Exception in thread "main" java.sql.SQLException: Could not retrieve > transation read-only status server > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:959) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:949) > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3967) > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3938) > at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2295) > at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2262) > at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2246) > at MySQLAccess.readDataBase(Java2MySql.java:66) > at MySQLAccess.main(Java2MySql.java:133) > Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: > Communications link failure > > The last packet successfully received from the server was 3,026 > milliseconds ago. The last packet sent successfully to the server was > 3,027 milliseconds ago. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:408) > at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137) > at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3965) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2578) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2769) > at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569) > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3961) > ... 6 more > Caused by: java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) > at java.net.SocketOutputStream.write(SocketOutputStream.java:153) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3946) > ... 12 more > > > When I restart the program using dmtcp_restart_script.sh, the same > error as above appears. > > Please note that the program does work if executed normally. I believe > that somehow, checkpointing is changing the state(?) of the database > connection. Is there a way to get this working? > > Thank you, > Pratyush > > On Tue, Apr 26, 2016 at 7:00 PM, Rohan Garg <rohg...@ccs.neu.edu> wrote: > > Hi Pratyush, > > > > Checkpointing of files and Java is supported out of the box. There are > > various runtime options (plugins) that you can use to modify the default > > behavior according to your requirements. I'm unable to reproduce the issue > > that you have reported with your example locally. I'm using the latest DMTCP > > source from Github. Here's what I did: > > > > $ javac AppMain.java > > $ dmtcp_launch java AppMain > > # Checkpoint and kill the application > > $ dmtcp_restart ckpt_java_*.dmtcp > > > > The error messages you are getting have nothing to do with checkpointing > > of open files. It seems like your application has some connections to > > other processes that are not running under DMTCP. Could you please verify > > if that's the case? Also, could you please try running with the latest > > release? > > > > Thanks, > > Rohan > > > > On Tue, Apr 26, 2016 at 11:57:30AM +0530, Pratyush Patel wrote: > >> Hello, > >> > >> I am using DMTCP to try and checkpoint a simple Java program, source > >> of which can be found at http://pastie.org/10801783. > >> > >> In the program input.txt is a large file which contains several lines > >> which I am trying to print. I am checkpointing the program by sending > >> the signal to checkpoint externally through the dmtcp_coordinator. > >> > >> Although, I expected the checkpointing process to work with the open > >> file descriptor, it appears that dmtcp is unable to checkpoint the > >> program properly, and results in some error messages like: > >> > >> [43000] NOTE at timerlist.cpp:107 in removeStaleClockIds; > >> REASON='Removing stale clock' > >> staleClockIds[i] = -100842 > >> [40000] WARNING at kernelbufferdrainer.cpp:125 in onTimeoutInterval; > >> REASON='JWARNING(false) failed' > >> _dataSockets[i]->socket().sockfd() = 11 > >> buffer.size() = 140 > >> WARN_INTERVAL_SEC = 10 > >> Message: Still draining socket... perhaps remote host is not running > >> under DMTCP? > >> [40000] WARNING at kernelbufferdrainer.cpp:125 in onTimeoutInterval; > >> REASON='JWARNING(false) failed' > >> _dataSockets[i]->socket().sockfd() = 11 > >> buffer.size() = 140 > >> WARN_INTERVAL_SEC = 10 > >> Message: Still draining socket... perhaps remote host is not running > >> under DMTCP? > >> > >> In case it matters, I am using Ubuntu 15.10 with 4.3.0-040300-generic > >> kernel. > >> I am also using the latest dmtcp source code available in Ubuntu > >> repository. > >> > >> Could anyone please let me know why this happens and whether there is > >> a way to get it working? > >> > >> Thanks, > >> Pratyush Patel > >> > >> ------------------------------------------------------------------------------ > >> Find and fix application performance issues faster with Applications > >> Manager > >> Applications Manager provides deep performance insights into multiple > >> tiers of > >> your business applications. It resolves application problems quickly and > >> reduces your MTTR. Get your free trial! > >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > >> _______________________________________________ > >> Dmtcp-forum mailing list > >> Dmtcp-forum@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ Find and fix application performance issues faster with Applications Manager Applications Manager provides deep performance insights into multiple tiers of your business applications. It resolves application problems quickly and reduces your MTTR. Get your free trial! https://ad.doubleclick.net/ddm/clk/302982198;130105516;z _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum