Re: [msvc] Management of large vectors

daniel . robinson Wed, 27 Oct 2004 01:55:06 -0700

>It is great to see I?m not the only chemist in the list :-)

We appear to be an endangered, and underpaid, species :)


>The high number of molecules (mostly proteins) is the result of molecular
>dynamics calculations.

Ahh well, if you're storing proteins then I'll save you the effort of looking
up either SMILES or CORINA,
neither will sensibly represent a complete proteing structure as they will
destroy the 3D information that
you've just spend hours calculating via MD runs!

>Latest dynamics we have run produced about 28000 conformers.

What did you use to create this dynamics run? And what are you doing to
the output.
I'm just thinking that most MD programs, CHARMm or NAMD will save a
trajectory file (DCD file) which is nice and binary, and more compact than
the collection of text based
PDB and MOL2 files that you describe later. The code for reading and writing
them is publically available
(as part of the NAMD code).

>Our application (which I?m not writing myself but a colleague of mine)
computes
>NMR properties such as NOE effects, cross-correlation, 3J scalar couplings
>and residual dipolar coupling constants.

I'll pretend that I remember what all of those are from my days as an undergraduate
;)

>The molecules are in PDB format (and sometimes in MOL format)

Like I said before both of these are text formats, and are horrifically
verbose for storing 28000 copies
of a large molecule.

The connectivity information for a protein need not be stored on a 'per-atom'
basis. Remember that proteins are made
up of standard residues connected in a standard manner. If you think about
how programs like CHARMM work they
store a residue topology file that defines the bonding for each residue
type once. This information is the then reused
each time. So it seems like you could do the same and store only 20 sets
of connectivity data (one for each residue type)
rather than the many thousands that you'll be saving for a typical protein.
When multiplied across 28000 conformers
this could amount to a significant saving. As I said in my first Email the
connectivity information will not change from
one conformer to another. All you need to store is the difference in atom
positions.

>In the program we have classes rather than structs. So the program reads
>the pdb files (normally a single PDB file with all the conformers embedded
>in the same file) and creates the list of molecules.

Again I'm going to have to question why you need to load all 28000 in at
one time?

> Some calculations have
> to be applied to all the conformers whilst other can be applied to single
> conformers (e.g. the currently displayed one).

Whilst I can see that you will need to do some averaging across the various
conformers I still fail to see why this will
require you to have more that the current 'working' molecule in memory at

any one time. I could be ignorant of the
details of your calculation.

> We also need to allow the
> user to see all the conformers displayed one over the other.

I personally doubt that this feature is going to be useful. Displaying thousands
of conformers simultaneously is going to
meaningless (the screen will just be a mass of inseparable atoms) and hideously
slow (I've never managed to display
more than a few hundred molecules simulataneously without things grinding
to a halt, despite using pretty expensive
hardware). Of course I can see that you might want to animate across your
series of conformations, so the user can
view the dynamics, but again this only needs one molecule to be loaded and
displayed at any one time.

Daniel




__________________________________________________________________
Win 12 amazing weekend breaks, one every month for a year to Dublin, Milan,
the Alps, Paris and beyond

http://www.tiscali.co.uk/travel/competitions/brilliant_weekends.html




_______________________________________________
msvc mailing list
[EMAIL PROTECTED]
See http://beginthread.com/mailman/listinfo/msvc_beginthread.com for subscription 
changes, and list archive.

Re: [msvc] Management of large vectors

Reply via email to