Hi Rohan,

That looks good! Thanks a lot for all the help.

For now, in addition to your patch, we are using a modified
application with an appropriate try-catch to work with restart, but
assuming we intend to write a DMTCP plugin to do this for future
convenience, could you please suggest any starting points or
references to do so? Perhaps, if such a tool already exists as a
plugin for Python or something similar, it would be great if you can
point us to it. Or, in case there exists some sort of documentation on
how one can approach this, that would be great as well. As you
explained, we now have a basic idea on how DMTCP manages socket
connections (draining and restoring) having read the research paper.

Thank you,
Pratyush

On Fri, Apr 29, 2016 at 1:58 AM, Rohan Garg <rohg...@ccs.neu.edu> wrote:
> I investigated this a little further. I have a patch (see attached)
> for DMTCP that'll allow you to avoid the connection exception thrown
> by Java when resuming after a checkpoint. It doesn't, however, preclude
> the exception thrown at restart time. There are three alternatives
> to avoid this exception thrown at restart time:
>
> (a) A DMTCP plugin re-initiates connection with MySQL server. This will
>     require reverse engineering the communication protocol and replaying
>     certain messages on restart to restore the server's state;
>
> (b) Run MySQL server and your Java application under DMTCP; and
>
> (c) Modify your Java application to handle broken connection on restart.
>     Here's what I tried and it seems to work:
>
>      while (true) {
>        try {
>          // Connect with DB if not already connected, and execute query
>          break;
>        } catch (com.mysql.jdbc.exceptions.jdbc4.CommunicationsException e) {
>          if (tryOnceMore) {
>            // Re-initiate connection with DB
>          } else {
>            break;
>          }
>        }
>      }
>
> On Thu, Apr 28, 2016 at 08:43:23AM -0400, Rohan Garg wrote:
>> I think I know what's going on. I believe the MySQL process is not
>> running under DMTCP. If that's correct, I'll just repeat what I
>> wrote in a previous thread.
>>
>> Basically, the problem is that there's an "external" socket that
>> DMTCP is trying to drain at checkpoint time. The external socket
>> here is the socket connection between the Java process and the MySQL
>> process. (More background below.)
>>
>> There are two alternatives I can think off the top of my head:
>>
>> 1) Identify this external socket, and ask DMTCP to
>>    "blacklist" it to avoid draining it at checkpoint time. For
>>    example, see the method `isBlacklistedTcp()` in socketconnection.cpp.
>>    
>> (https://github.com/dmtcp/dmtcp/blob/2.5/src/plugin/ipc/socket/socketconnection.cpp#L220)
>>
>> 2) Run the MySQL process under DMTCP. We have tried running
>>    the entire LAMP stack under DMTCP in the past, and it works.
>>
>> Background:
>>
>> For checkpointing, DMTCP classifies a socket in to two categories
>> -- external and internal. An internal socket is when the two end
>> points of the socket are running under DMTCP. In this case, DMTCP
>> will, at checkpoint time, first quiesce the two processes, and then
>> capture the in-flight data by "draining" the socket from both ends.
>> On restart, DMTCP will restore the socket and put the captured data
>> back on the network. I believe the error you see at checkpoint time
>> is an artifact of the draining done by DMTCP. (For details about
>> how this is done, see sections 4.3 and 4.4 in
>> http://dmtcp.sourceforge.net/papers/dmtcp.pdf).
>>
>> The difficult case is when only one end, for example, the client
>> in a client-server application, is running under DMTCP. In this
>> case, the socket that the client uses to talk to the server needs
>> to be marked as "external", implying that DMTCP will not try to
>> drain the socket at checkpoint time. (There is a heuristic we use
>> to detect an external socket but that does not always work.) On
>> restart, the socket is presented as a dead socket to the client,
>> and it's the client's responsibility to either ignore this dead
>> socket if it's unimportant, or recover by creating a new socket if
>> it's important.
>>
>> If you go with the first alternative I mentioned above, one possibility
>> is to re-initiate the socket (JDBC) connection on restart. Since
>> we don't have a Java API yet (we do have a Python API), it's a
>> little tricky but doable.  Could you try to catch the connection
>> exception and re-initiate the JDBC connection? I believe it's mostly
>> stateless so restoring the state shouldn't be a concern.
>>
>> Another possibility is to write a DMTCP plugin that knows how to
>> checkpoint and restore Java's connections to MySQL. This is a more
>> modular approach in that you don't have to worry about modifying
>> the core application code and maintaining it as it evolves.
>>
>> On Thu, Apr 28, 2016 at 01:43:31PM +0530, Pratyush Patel wrote:
>> > Hello Rohan,
>> >
>> > Thank you for the reply. You were right, there was indeed a command
>> > (which created a process) being run by our Java program while the
>> > checkpointing took place. I was able to get it working by running the
>> > external process as a bash script and calling it in Java instead.
>> >
>> > I have one further question. While using DMTCP to checkpoint a simple
>> > JDBC application (example:
>> > http://www.vogella.com/tutorials/MySQLJava/article.html#javaconnection),
>> > after opening the database connection, DMTCP completes the
>> > checkpointing, but the program then results in an error as follows:
>> >
>> > $ dmtcp_launch java MyProgram
>> > dmtcp_coordinator starting...
>> >     Host: PC (127.0.1.1)
>> >     Port: 7779
>> >     Checkpoint Interval: disabled (checkpoint manually instead)
>> >     Exit on last client: 1
>> > Backgrounding...
>> > /* Program Output */
>> > /* Program Output */
>> > /* Checkpoint command issued, checkpoint complete */
>> > Exception in thread "main" java.sql.SQLException: Could not retrieve
>> > transation read-only status server
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094)
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997)
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983)
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928)
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:959)
>> > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:949)
>> > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3967)
>> > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3938)
>> > at 
>> > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2295)
>> > at 
>> > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2262)
>> > at 
>> > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2246)
>> > at MySQLAccess.readDataBase(Java2MySql.java:66)
>> > at MySQLAccess.main(Java2MySql.java:133)
>> > Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>> > Communications link failure
>> >
>> > The last packet successfully received from the server was 3,026
>> > milliseconds ago.  The last packet sent successfully to the server was
>> > 3,027 milliseconds ago.
>> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> > at 
>> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>> > at 
>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> > at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>> > at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>> > at 
>> > com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
>> > at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3965)
>> > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2578)
>> > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
>> > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
>> > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2769)
>> > at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
>> > at com.mysql.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:3961)
>> > ... 6 more
>> > Caused by: java.net.SocketException: Broken pipe
>> > at java.net.SocketOutputStream.socketWrite0(Native Method)
>> > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>> > at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>> > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>> > at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3946)
>> > ... 12 more
>> >
>> >
>> > When I restart the program using dmtcp_restart_script.sh, the same
>> > error as above appears.
>> >
>> > Please note that the program does work if executed normally. I believe
>> > that somehow, checkpointing is changing the state(?) of the database
>> > connection. Is there a way to get this working?
>> >
>> > Thank you,
>> > Pratyush
>> >
>> > On Tue, Apr 26, 2016 at 7:00 PM, Rohan Garg <rohg...@ccs.neu.edu> wrote:
>> > > Hi Pratyush,
>> > >
>> > > Checkpointing of files and Java is supported out of the box. There are
>> > > various runtime options (plugins) that you can use to modify the default
>> > > behavior according to your requirements. I'm unable to reproduce the 
>> > > issue
>> > > that you have reported with your example locally. I'm using the latest 
>> > > DMTCP
>> > > source from Github. Here's what I did:
>> > >
>> > >   $ javac AppMain.java
>> > >   $ dmtcp_launch java AppMain
>> > >   # Checkpoint and kill the application
>> > >   $ dmtcp_restart ckpt_java_*.dmtcp
>> > >
>> > > The error messages you are getting have nothing to do with checkpointing
>> > > of open files. It seems like your application has some connections to
>> > > other processes that are not running under DMTCP. Could you please verify
>> > > if that's the case? Also, could you please try running with the latest
>> > > release?
>> > >
>> > > Thanks,
>> > > Rohan
>> > >
>> > > On Tue, Apr 26, 2016 at 11:57:30AM +0530, Pratyush Patel wrote:
>> > >> Hello,
>> > >>
>> > >> I am using DMTCP to try and checkpoint a simple Java program, source
>> > >> of which can be found at http://pastie.org/10801783.
>> > >>
>> > >> In the program input.txt is a large file which contains several lines
>> > >> which I am trying to print. I am checkpointing the program by sending
>> > >> the signal to checkpoint externally through the dmtcp_coordinator.
>> > >>
>> > >> Although, I expected the checkpointing process to work with the open
>> > >> file descriptor, it appears that dmtcp is unable to checkpoint the
>> > >> program properly, and results in some error messages like:
>> > >>
>> > >> [43000] NOTE at timerlist.cpp:107 in removeStaleClockIds;
>> > >> REASON='Removing stale clock'
>> > >>      staleClockIds[i] = -100842
>> > >> [40000] WARNING at kernelbufferdrainer.cpp:125 in onTimeoutInterval;
>> > >> REASON='JWARNING(false) failed'
>> > >>      _dataSockets[i]->socket().sockfd() = 11
>> > >>      buffer.size() = 140
>> > >>      WARN_INTERVAL_SEC = 10
>> > >> Message: Still draining socket... perhaps remote host is not running
>> > >> under DMTCP?
>> > >> [40000] WARNING at kernelbufferdrainer.cpp:125 in onTimeoutInterval;
>> > >> REASON='JWARNING(false) failed'
>> > >>      _dataSockets[i]->socket().sockfd() = 11
>> > >>      buffer.size() = 140
>> > >>      WARN_INTERVAL_SEC = 10
>> > >> Message: Still draining socket... perhaps remote host is not running
>> > >> under DMTCP?
>> > >>
>> > >> In case it matters, I am using Ubuntu 15.10 with 4.3.0-040300-generic 
>> > >> kernel.
>> > >> I am also using the latest dmtcp source code available in Ubuntu 
>> > >> repository.
>> > >>
>> > >> Could anyone please let me know why this happens and whether there is
>> > >> a way to get it working?
>> > >>
>> > >> Thanks,
>> > >> Pratyush Patel
>> > >>
>> > >> ------------------------------------------------------------------------------
>> > >> Find and fix application performance issues faster with Applications 
>> > >> Manager
>> > >> Applications Manager provides deep performance insights into multiple 
>> > >> tiers of
>> > >> your business applications. It resolves application problems quickly and
>> > >> reduces your MTTR. Get your free trial!
>> > >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>> > >> _______________________________________________
>> > >> Dmtcp-forum mailing list
>> > >> Dmtcp-forum@lists.sourceforge.net
>> > >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications Manager
>> Applications Manager provides deep performance insights into multiple tiers 
>> of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
>> _______________________________________________
>> Dmtcp-forum mailing list
>> Dmtcp-forum@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to