parsing and chunking large xyz files

cej38 Fri, 26 Dec 2014 06:50:09 -0800

In molecular dynamics a popular format for writing out the positions of the 
atoms in a system is the xyz file format (see: 
http://en.wikipedia.org/wiki/XYZ_file_format and/or 
http://www.ks.uiuc.edu/Research/vmd/plugins/molfile/xyzplugin.html).  The 
format allows for storing the positions of the atoms at different snapshots 
in time (aka "time step").  You may have a few to millions of atoms in your 
system and you may have thousands of time steps represented in the file. 
 It is easy to end up with a single file that is many GB in size.  Here is 
a shell command that will create a very simple, and very small, test file 
(note that the positions of the atoms are completely unrealistic-they are 
all sitting on top of each other)


perl -e 'open(F, ">>test1.xyz"); for( $t= 1; $t < 11; $t = $t +1){print F 
"10\n\n"; for( $a = 1; $a < 11; $a = $a + 1 ){print F "C  0.000 0.000 
0.0000\n";}}; close(F);'


Here is a shell command that will produce a more complicated file structure 
(note that depending on who wrote the code that output the file there may 
be other columns of data at the end of each row, also the number of decimal 
places kept and the type of spacing between elements may change), this file 
has a different number of atoms with each time step :

perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t 
+ 10; print F "$s \n"; my $color  = substr ("abcd efghij klmno pqrs tuv 
wxyz", int(rand(10)), int(rand(10))); print F $color; print F "\n" ;for( $a 
= 1; $a < (11 +$t); $a = $a + 1 ){print F "C    10.000000   10.00000   
10.00000   $a\n";}}; close(F);'
perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t 
+ 10; print F "$s \n"; myperl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 
5; $t = $t +1){my $s= $t + 10; print F "$s \n"; my

Ok, that is the background to get to my question.  I need a way to parse 
these files and group the lines into time steps.  I currently have 
something that works but only in cases where the file size is relatively 
small-it reads the whole file into memory.  I would like to use something 
like iota that will allow me lazily parse the file and run reducers on the 
data.  Any help would be really appreciated.




-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

parsing and chunking large xyz files

Reply via email to