Sorry for the delay, but I wanted to add a few points to this discussion as I am doing my PhD in the group where dPCA was developed. I hope these will clarify a little bit what's done in the tutorial and why (took me a while to figure it out). Obviously, it would be worth to rewrite my comments in a slightly more concise form, maybe with examples, and make them available to the community on some suitable platform. I will see if I find the time. @Mark: which form should such a suggestion for an improved How-To have and where could I submit it and how?
As an introduction and following up on what Mark said, I want to point out that you do not need any GROMACS tool for doing a PCA, but you *can* use them in a way that somewhat "misuses" trr and gro files. The way it's done in the tutorial means that the covariance matrix, eigenvectors and the PCs are stored in the trr files - but beware: the trr format was meant for storing trajectory data, not PCA data. That's why Mark called it a "hack" and that's why we need the "resized.gro", as explained later. In our group, we do a lot of PCA and usually just work on plain ASCII data files as input and output files. But there is not (yet) a black-box programm where you just plug in a trajectory and in the end get nice results - you should know what's going on under the hood, as only then you can interpret the PCA results properly and judge if they make sense. What you need for actually performing a PCA are 1) the data you want to perform the PCA on. This could be an MD trajectory (in case there is *really* a good reason to use cartesian coordinates, internal coordinates are more advisable), but as well an ASCII file where you saved all the angles or distances or whatever coordinates you want to use. Personally, I like the MDAnalysis Python library as it easily allows to monitor e.g. a few angles or distances or whatever quantity seems a good idea to try as input for a PCA. You may also kick out irrelevant observables to speed up things and reduce noise. 2) a tool to calculate the covariance matrix of your data, diagonalize it and project the data on the eigenvectors, i.e. calculate the PCs. In this step, it might be a good idea to save some files, namely the statistics of the data, covariance matrix, eigenvectors and -values and of course the PCs which is a LxD file with L lines (observations) for the L frames you had in the trajectory and D variables (observables) that your input file had. Note: if you started with an ASCII file, it also had dimensions LxD (PCA is just a rotation of your coordinates). 3) analysis tools to create whatever plots you like. Usually, we use 1D and 2D histograms as well as autocorrelation decays. For step 2, I can recommend a tool written by a colleague of mine called fastCPA that you can find at http://lettis.github.io/FastPCA Now, I would like to quote a post I did some time ago at https://www.researchgate.net/post/Can_anyone_help_me_with_doubts_regarding_dihedral_PCA: ----------- dPCA was developed in the group where I'm working now and I have to say that the way it is described in the GROMACS howto (http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA) feels like a dirty hack. It works, though, if you do it correctly. The trick is that you first do the sin- and cos-transformation of the dihedrals, save them to a kind of "dummy" trajectory, create a kind of "dummy" structure just to get the right number of coordinates and then evaluate it using the standard GROMACS tools. Let me briefly summarize what the tutorial tells you to do (I refer to their numbering) and what is actually meant: A) choose the dihedral angles you need (1. in tutorial, given by 4 atoms per dihedral) and do the sin- and cos-transformation. This is then saved as a new GROMACS trajectory (called dangle.trr in the tutorial). BUT, the values are saved in triples of variables as the trajectory was initially intended for cartesian coordinates, meaning there can be "empty" coordinates which are 0.0 always if the number of coordinates you use is not divisable by 3. B) create a dummy index file which just tells g_covar how many atoms you need for storing all values you have. Step 3 says you need to put integer(2*N/3) atoms in the index file. This means: you have N dihedrals and doing the sin- and cos-transformation, you get 2*N coordinates. They are stored in triples of 3 as mentioned above, so you need M=2*N/3 atoms but you need to round up (indicated by the "integer") to make sure you don't drop one or two coordinates. That's why in the example you need to count up to 4 in the index file (10 coordinates after transformation meaning 3,33 atoms => round up to 4) C) Create also a dummy gro file which also contains M atoms. This can contain any values whatsoever, it just has to be a .gro file with the correct number of atoms. Step 4 in the tutorial does this. D) To actually perform the PCA, you need to diagonalize the covariance matrix of your coordinates. This is done by g_covar in step 5. You need to provide your dummy gro and index files as well as some options to tell the tool not to do fitting, mass weighting or whatever, which is why there are so many flags. If something goes wrong here, I would assume that there is an error in the files you had to create in the steps before. E) To evaluate your PCA, you can plot one- or two-dimensional projections of the data onto the eigenvectors of the covariance matrix (the so-called principal components). This is done in steps 6 to 8. -------- Best Regards, Matthias On 02/28/2017 05:17 PM, Mark Abraham wrote: > Hi, > > On Tue, Feb 28, 2017 at 5:51 AM Anna Vernon <lappala.a...@gmail.com> wrote: > >> Thank you Justin, >> >> that makes sense, thank you for help. However, I am struggling with the >> rest, due to lacking documentation & examples. Like Alex said, this >> web-page http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA is >> outdated, but there is no new version of it available... > > > Well, when you figure out some improvements, please suggest them. > Unfortunately it is often the case with such community projects that > someone writes some code, contributes it, and then disappears leaving > nobody quite sure what it is good for or how it works :-) > > >> Or at least I >> am not finding it (and I have noticed that people on the list have asked >> questions similar to mine, but there were no answers to any of my >> questions, sadly). >> >> mk_angndx -s h3_md.tpr -n h3_ANGLE.ndx -type dihedral >> >> is what I now have done, and my next steps should be the following, I >> guess, >> > > Paraphrasing, the PCA howto at your link says "1. use gmx mk_angndx, 2. use > gmx angle -or dangle.trr, 3. make covar.ndx 4. gmx trjconv -f dangle.trr" > > trjconv -s h3_md.tpr -f dangle.trr -o resized.gro -n covar.ndx -e 0 >> >> but I do not have a trr file because mk_angndx does not generate it >> (unlike the old documentation command angle) > > > mk_angndx never did that. You seem to be skipping the gmx angle command > that makes that trr file. > > >> and again, my tpr file >> contains waters, not just the molecule of interest that I have in my >> .gro file. I also do not understand the difference between the >> dangle.ndx file and covar.ndx file... What is resized.gro? > > > It's a hack that copes with the way somebody thought it was a good idea to > abuse the trr format for storing non-trajectory data, to suit things like > covariance analyses. Steps 3 and 4 seem fairly easy to follow, while I > agree that the reason for doing them is mysterious. :-) > > >> It would be >> really helpful to write out those explanations in documentation and >> perhaps give a more realistic example because this really becomes >> frustrating. >> > > I'm open to suggested improvements, but a "realistic" example with 243 > dihedrals seems unlikely to make the instructions easier to follow... > > Mark > > Thanks for your time & help, it is very much appreciated. >> Anna >> >> >> >> On 27/02/17 19:58, Justin Lemkul wrote: >>> >>> >>> On 2/27/17 9:55 PM, Anna Vernon wrote: >>>> Thanks Justin, >>>> >>>> but I did use this command, which is supposed to be dihedral: >>>> >>>> g_angle -f h3_nowater.gro -n h3_ANGLE.ndx -or h3_ANGLE.trr -type >>>> dihedral >>>> >>> >>> Sure, but g_angle is only going to give you the options presented to >>> it in the index file. If those groups were created incorrectly (note >>> my comments were about mk_angndx, not g_angle) then you have to create >>> them properly. You can't get dihedrals from valence angles :) >>> >>> -Justin >>> >>>> thanks >>>> >>>> A >>>> >>>> >>>> On 27/02/17 19:35, Justin Lemkul wrote: >>>>> >>>>> >>>>> On 2/27/17 9:29 PM, Anna Vernon wrote: >>>>>> Thanks Alex. >>>>>> >>>>>> >>>>>> tried the new syntax, below the output of g_angle, which did not >>>>>> get any >>>>>> easier... Хрень осталась...))) >>>>>> >>>>>> Group 0 (UB_th=108.4_297.062f) has 36 elements >>>>>> Group 1 (UB_th=108.9_384.092f) has 30 elements >>>>>> Group 2 (UB_th=109.7_794.962f) has 6 elements >>>>>> Group 3 (UB_th=111.5_376.562f) has 6 elements >>>>>> Group 4 (UB_th=110.1_221.752f) has 36 elements >>>>>> Group 5 (UB_th=109.0_297.062f) has 33 elements >>>>>> Group 6 (UB_th=109.5_430.952f) has 36 elements >>>>>> Group 7 (UB_th=113.5_585.762f) has 12 elements >>>>>> Group 8 (UB_th=121.0_376.562f) has 12 elements >>>>>> Group 9 (UB_th=119.5_351.462f) has 24 elements >>>>>> Group 10 (UB_th=124.0_669.442f) has 12 elements >>>>>> Group 11 (UB_th=112.5_167.362f) has 12 elements >>>>>> Group 12 (UB_th=121.0_669.442f) has 15 elements >>>>>> Group 13 (UB_th=109.5_276.142f) has 30 elements >>>>>> Group 14 (UB_th=107.0_418.402f) has 12 elements >>>>>> Group 15 (UB_th=112.2_365.682f) has 6 elements >>>>>> Group 16 (UB_th=109.5_271.122f) has 6 elements >>>>>> Group 17 (UB_th=109.5_271.122f) has 12 elements >>>>>> Group 18 (UB_th=110.0_365.682f) has 3 elements >>>>>> Group 19 (UB_th=111.0_292.882f) has 6 elements >>>>>> Group 20 (UB_th=112.2_338.902f) has 3 elements >>>>>> Group 21 (UB_th=111.0_359.822f) has 6 elements >>>>>> Group 22 (UB_th=108.0_401.662f) has 6 elements >>>>>> Group 23 (UB_th=110.1_288.702f) has 6 elements >>>>>> Group 24 (UB_th=115.2_443.502f) has 6 elements >>>>>> Group 25 (UB_th=107.5_433.462f) has 6 elements >>>>>> Group 26 (UB_th=120.0_383.252f) has 12 elements >>>>>> Group 27 (UB_th=120.0_334.722f) has 36 elements >>>>>> Group 28 (UB_th=120.0_251.042f) has 60 elements >>>>>> Group 29 (UB_th=110.1_279.742f) has 18 elements >>>>>> Group 30 (UB_th=105.8_351.462f) has 3 elements >>>>>> Group 31 (UB_th=112.1_343.092f) has 6 elements >>>>>> Group 32 (UB_th=116.5_418.402f) has 3 elements >>>>>> Group 33 (UB_th=122.5_627.602f) has 3 elements >>>>>> Group 34 (UB_th=120.0_418.402f) has 6 elements >>>>>> Group 35 (UB_th=120.0_192.462f) has 3 elements >>>>>> >>>>> >>>>> These are Urey-Bradley angles with the given equilibrium angles and >>>>> force >>>>> constants. If you're using mk_angndx to create the index group, you >>>>> want >>>>> "-type dihedral" instead of "-type angle" (which is the default). >>>>> >>>>> -Justin >>>>> >>>>>> >>>>>> On 27/02/17 16:04, Alex wrote: >>>>>>> I think you might be referring to the 2010 version of the online >>>>>>> manual ( >>>>>>> http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA), which >>>>>>> is pretty >>>>>>> outdated. A lot of syntax seems to be changing fairly rapidly, so >>>>>>> хрень >>>>>>> like this is to be expected. ;) >>>>>>> The current manual points to gmx angle, gmx covar and, gmx anaeig for >>>>>>> covariance analysis, and for those the updated help pages are >>>>>>> separate: >>>>>>> http://manual.gromacs.org/programs/gmx-angle.html >>>>>>> http://manual.gromacs.org/programs/gmx-covar.html >>>>>>> http://manual.gromacs.org/programs/gmx-anaeig.html >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> On Mon, Feb 27, 2017 at 3:44 PM, Anna Lappala >>>>>>> <lappala.a...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Dear Gromacs users, >>>>>>>> >>>>>>>> The documentation on Dihedral PCA analysis is confusing. >>>>>>>> >>>>>>>> Firstly, the links are redirecting to the main gromacs page >>>>>>>> which does >>>>>>>> not help because one needs to search for the specific commands >>>>>>>> from there. >>>>>>>> >>>>>>>> Secondly, -s option does not work in step 2: invalid command line >>>>>>>> argument >>>>>>>> "-s". >>>>>>>> >>>>>>>> Thirdly, my tpr file contains water which I do not want to >>>>>>>> include in the >>>>>>>> analysis, how can I change that? >>>>>>>> >>>>>>>> .ndx file output produced by mk_angndx is a mess- first line, for >>>>>>>> example, >>>>>>>> is [UB_th=108.4_297.062f] >>>>>>>> >>>>>>>> I need a small number of dihedrals, so I could write the file >>>>>>>> myself but >>>>>>>> there again is no clear explanation how to do it: the [foo] 1 2 3 >>>>>>>> 4 example >>>>>>>> could be expanded- do I use "dihedrals" instead of foo? Or >>>>>>>> filename? Or >>>>>>>> molecule name? >>>>>>>> >>>>>>>> Finally, I would really appreciate it if someone could give an >>>>>>>> explanation >>>>>>>> or a clear example of how this works, for a simple system or a >>>>>>>> protein, >>>>>>>> because documentation is unclear to me. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Anna >>>>>>>> -- >>>>>>>> Gromacs Users mailing list >>>>>>>> >>>>>>>> * Please search the archive at http://www.gromacs.org/ >>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting! >>>>>>>> >>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>>>>> >>>>>>>> * For (un)subscribe requests visit >>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>>>>>>> or >>>>>>>> send a mail to gmx-users-requ...@gromacs.org. >>>>>>>> >>>>>> >>>>> >>>> >>> >> >> -- >> Gromacs Users mailing list >> >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> send a mail to gmx-users-requ...@gromacs.org. -- http://phdcomics.com/comics/archive.php?comicid=1896 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.