----- Original Message ----- From: Aditi Gupta <[EMAIL PROTECTED]> Date: Monday, May 9, 2005 12:52 pm Subject: Re: extracting coordinates
> hi, Hello, > the fields can not be splitted using /s because some fields have > common > boundaries, i.e. some fields are from column 31-38 and the next > field starts > from 39.. As in > COLUMNS DATA TYPE FIELD DEFINITION That can very well be the case, but y examining the data [ in all of your examples ] spliting on white space will work. If not, can you tell us a case where it won't work ?? Mark G. > ------------------------------------------------------------------- > -------------- > 1 - 6 Record name "ATOM " > > 7 - 11 Integer serial Atom serial number. > > these are the 1st two fields. > On 5/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > > > > > ----- Original Message ----- > > From: Aditi Gupta <[EMAIL PROTECTED]> > > Date: Monday, May 9, 2005 11:41 am > > Subject: extracting coordinates > > > > > Hi everyone, > > Hello Aditi, > > > > > > > > That code is working... > > > But my specific problem is as follows: > > > > > > i have a file in which data is stored as > > > > > > HELIX 4 4 VAL 74 LEU 84 1 11 > > > CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4 > > > ORIGX1 1.000000 0.000000 0.000000 0.00000 > > > ORIGX2 0.000000 1.000000 0.000000 0.00000 > > > ORIGX3 0.000000 0.000000 1.000000 0.00000 > > > SCALE1 0.030285 0.000000 0.000000 0.00000 > > > SCALE2 0.000000 0.029630 0.000000 0.00000 > > > SCALE3 0.000000 0.000000 0.013215 0.00000 > > > ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N > > > ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C > > > ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C > > > ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O > > > ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C > > > ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C > > > ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C > > > ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C > > > ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N > > > ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C > > > ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C > > > > > > > > > > > > the fields of records having ATOM as 1st field are as follows: > > > > > > COLUMNS DATA TYPE FIELD DEFINITION > > > --------------------------------------------------------------- > ---- > > > -------------- > > > 1 - 6 Record name "ATOM " > > > > > > 7 - 11 Integer serial Atom serial number. > > > > > > 13 - 16 Atom name Atom name. > > > > > > 17 Character altLoc Alternate location > > > indicator. > > > 18 - 20 Residue name resName Residue name. > > > > > > 22 Character chainID Chain identifier. > > > > > > 23 - 26 Integer resSeq Residue sequence number. > > > > > > 27 AChar iCode Code for insertion of > > > residues. > > > 31 - 38 Real(8.3) x Orthogonal > > > coordinates for X in > > > Angstroms. > > > > > > 39 - 46 Real(8.3) y Orthogonal > > > coordinates for Y in > > > Angstroms. > > > > > > 47 - 54 Real(8.3) z Orthogonal > > > coordinates for Z in > > > Angstroms. > > > > > > 55 - 60 Real(6.2) occupancy Occupancy. > > > > > > 61 - 66 Real(6.2) tempFactor Temperature factor. > > > > > > 73 - 76 LString(4) segID Segment identifier, > > > left-justified. > > > > > > 77 - 78 LString(2) element Element symbol, right- > > > justified. > > > 79 - 80 LString(2) charge Charge on the atom. > > > > > > > > > > > > I have to get the x,y,z coordinates of records whose atom name is > > > 'CA'(highlighted as blue). > > > > > > I wrote a code but its giving many errors.. > > > > > > The code is: > > > > > > > > > > > > #!usr/bin/perl > > > use warnings; > > > > > > $filename = "1a32.txt"; > > > chomp $filename; > > > > The above line is useless, perldoc -f chomp > > > > > open (FILEHANDLE, "$filename") or die "couldn't open > $filename:$!";> > @file= <FILEHANDLE>; > > > close (FILEHANDLE); > > > > > > $a= "ATOM"; > > > $c= "CA"; > > > > > > foreach $line(@file) > > > { > > > if(my $line =~ /^/$a/\s* > > > (\s*\d+) > > > \s*/$c/\s* > > > \d* > > > \w+ > > > \s > > > \w > > > (\s*\d+) > > > \w* > > > (\s*\d*) > > > (\s*\d*) > > > (\s*\d*) > > > (\s*\d*) > > > (\s*\d*) > > > (\w*\s*) > > > (\s*\w*) > > > (\s*\w*)/) > > > > Youch, that is way to long of a regular expression [ atleast for > me ], you > > may consider a shorter nested version such as my @fields =~ > /([\w+\s+])/g. > > In any case I think split would work the best here [ my @fields > = split > > /\s/,$line ], since your fields are locked into place. In > general you PAD > > data, untill you get it all uniformed such as yours. Below is > some simple > > code that should help you on your way, feel free to modify at will. > > > > > > > > { > > > my $x= substr($line,30,8); > > > my $y= substr($line,38,8); > > > my $z= substr($line,46,8); > > > > > > print "$x\t$y\t$z\n"; > > > } > > > } > > > > > > #------------------------------------------------- > > > > > > > > > > > > The errors that i'm getting are: > > > > > > Scalar found where operator expected at two.pl line 15, near > "/^/$a"> > (Missing operator before $a?) > > > Unrecognized escape \d passed through at two.pl line 15. > > > Unrecognized escape \s passed through at two.pl line 15. > > > Unrecognized escape \w passed through at two.pl line 17. > > > Unrecognized escape \s passed through at two.pl line 17. > > > Unrecognized escape \w passed through at two.pl line 17. > > > Unrecognized escape \s passed through at two.pl line 17. > > > Backslash found where operator expected at two.pl line 22, near > > > "(\s*\" (Might be a runaway multi-line ** string starting on > line 17) > > > (Missing operator before \?) > > > Unquoted string "d" may clash with future reserved word at two.pl > > > line 22. > > > Backslash found where operator expected at two.pl line 23, > near ") > > > \" > > > (Missing operator before \?) > > > Unquoted string "w" may clash with future reserved word at two.pl > > > line 23. > > > Unrecognized escape \s passed through at two.pl line 24. > > > Backslash found where operator expected at two.pl line 25, near > > > "(\s*\" (Might be a runaway multi-line ** string starting on > line 24) > > > (Missing operator before \?) > > > Unquoted string "d" may clash with future reserved word at two.pl > > > line 25. > > > Unrecognized escape \s passed through at two.pl line 26. > > > Backslash found where operator expected at two.pl line 27, near > > > "(\s*\" (Might be a runaway multi-line ** string starting on > line 26) > > > (Missing operator before \?) > > > Unquoted string "d" may clash with future reserved word at two.pl > > > line 27. > > > Unrecognized escape \w passed through at two.pl line 28. > > > Backslash found where operator expected at two.pl line 29, near > > > "(\w*\" (Might be a runaway multi-line ** string starting on > line 28) > > > (Missing operator before \?) > > > Unrecognized escape \w passed through at two.pl line 29. > > > syntax error at two.pl line 15, near "/^/$a" > > > Substitution replacement not terminated at two.pl line 31. > > > > > > Please help me.. > > > > > > > #!usr/bin/perl > > use warnings; > > use strict; > > my $filename = "1a32.txt"; > > my $Atom='CA'; > > > > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > > @file= <FILEHANDLE>; > > close (FILEHANDLE); > > > > foreach my $line ( @file ){ > > > > my @fields = split /\s/,$line; > > print "X: $fields[-4] Y: $fields[-3] Z: $fields[-2]\n" if uc > $fields[2] eq > > '$Atom; > > > > } > > > > HTH, > > Mark G. > > > > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>