On Thu, Jun 1, 2017 at 9:39 PM, Elizabeth Ploetz <plo...@ksu.edu> wrote:
>>> However, if most runs are group scheme, a quick check could show whether >>> jumps are present in runs that i) do PP-PME tuning ii) if logs go truncated >>> during continuation at least whether they do use separate PME ranks >>> (because otherwise CPU-only runs don't tune). >> >> i) If grepping "timed" from the LOG file does not give any output, does >> that mean there was no PP-PME tuning? (Sorry for the stupid question. I'm >> not sure which piece of information from the LOG file is going to answer >> whether or not there was PP-PME tuning.) > Do you run with -append? If so, the log file too gets truncated, but I do > not recall exactly where and whether the PP-PME balancing messages are > removed or not, but it's not hard to try -- just run with separate PME and > too few of them (e.g. 1 out of 12) and that will trigger load balancing. I run with -noappend, so I think the LOG files are intact. I get load balancing messages. > On a second thought, instead of testing with Verlet, you might want to just > do the above and try to directly observe the anomalies after the balancer. >> If so, perhaps there is a correlation between having PP-PME tuning and >> having a jump. Please see this link<http://i1243.photobucket. >> com/albums/gg545/ploetz/volumeJumps_zps8hmlghtn.png>. *If* the volume for >> 40-60ns of row 3 is the correct system volume, then all the data in this >> figure is consistent with there being a jump when there is PP-PME tuning. >> (Please note that while the data at 1 bar looks okay in this case, and >> elevated pressures do not, this is not always true. We get jumps at 1 bar >> as well sometimes.) >> ii) These are all CPU-only runs. The simulations always use separate PME >> ranks. >> Please let me know if any particular data from the LOG file would be >> helpful. >> > It would be easier if you provided logs that we can look through. Please see three log files here: https://drive.google.com/open?id=0BznaVquT5XVyVkNjMHh2eXJ5amc . These correspond to the third row (6 kbar) data I just linked to in my previous post (directly above, the volumeJumps.png image). The 40-60ns log doesn't have the "PP-PME Load Balancing" section, but the other two do. >>> >>> If I understood correctly, it's only group scheme runs where this has been >>> observed, so it could be some newer feature/change that interacts badly >>> with the group scheme. >> >> You are correct, so far we have not seen any jumps with Verlet. >> >>> BTW, do you have any data with 4.5? >>> >> I have a few old simulations with version 4.5.3 (none with 4.5, sorry). >> They were all ran with inexact continuations (i.e., I did not provide >> checkpoint files when running multiple short runs to create one long >> simulation) or single trajectories that I had killed at various points and >> then continued using checkpoint files and -append. I don't have a huge data >> set with 4.5.3, but none of them exhibited jumps! >> >>> I'd suggest that (especially if if investigation of current data does not >>> reveal the reasons) pick a setup where you seemed to get the anomaly and >>> run with the same settings using the Verlet scheme lots of short runs with >>> restarts in a loop. >>> >> Thanks, we are doing this test. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.