In molecular dynamics a popular format for writing out the positions of the atoms in a system is the xyz file format (see: http://en.wikipedia.org/wiki/XYZ_file_format and/or http://www.ks.uiuc.edu/Research/vmd/plugins/molfile/xyzplugin.html). The format allows for storing the positions of the atoms at different snapshots in time (aka "time step"). You may have a few to millions of atoms in your system and you may have thousands of time steps represented in the file. It is easy to end up with a single file that is many GB in size. Here is a shell command that will create a very simple, and very small, test file (note that the positions of the atoms are completely unrealistic-they are all sitting on top of each other)
perl -e 'open(F, ">>test1.xyz"); for( $t= 1; $t < 11; $t = $t +1){print F "10\n\n"; for( $a = 1; $a < 11; $a = $a + 1 ){print F "C 0.000 0.000 0.0000\n";}}; close(F);' Here is a shell command that will produce a more complicated file structure (note that depending on who wrote the code that output the file there may be other columns of data at the end of each row, also the number of decimal places kept and the type of spacing between elements may change), this file has a different number of atoms with each time step : perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t + 10; print F "$s \n"; my $color = substr ("abcd efghij klmno pqrs tuv wxyz", int(rand(10)), int(rand(10))); print F $color; print F "\n" ;for( $a = 1; $a < (11 +$t); $a = $a + 1 ){print F "C 10.000000 10.00000 10.00000 $a\n";}}; close(F);' perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t + 10; print F "$s \n"; myperl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t + 10; print F "$s \n"; my Ok, that is the background to get to my question. I need a way to parse these files and group the lines into time steps. I currently have something that works but only in cases where the file size is relatively small-it reads the whole file into memory. I would like to use something like iota that will allow me lazily parse the file and run reducers on the data. Any help would be really appreciated. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.