----- Original Message ----- From: "Aditi Gupta" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <beginners@perl.org>
Sent: Monday, May 09, 2005 11:52 AM
Subject: Re: extracting coordinates
hi, the fields can not be splitted using /s because some fields have common boundaries, i.e. some fields are from column 31-38 and the next field starts from 39.. As in COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "ATOM "
7 - 11 Integer serial Atom serial number.
these are the 1st two fields. On 5/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
----- Original Message ----- From: Aditi Gupta <[EMAIL PROTECTED]> Date: Monday, May 9, 2005 11:41 am Subject: extracting coordinates
> Hi everyone, Hello Aditi,
Hello,
while(<DATA>){
my @fields = split /\s/;
if (/([-]\d{1,2}\.\d{2,3})\s([-]\d{1,2}\.\d{2,3})\s(\d{1,2}\.\d{2,3})/ and $fields[2] eq "CA"){
print "$1, $2, $3\n";
}
}
__DATA__ HELIX 4 4 VAL 74 LEU 84 1 11 CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.030285 0.000000 0.000000 0.00000 SCALE2 0.000000 0.029630 0.000000 0.00000 SCALE3 0.000000 0.000000 0.013215 0.00000 ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C
> > That code is working... > But my specific problem is as follows: > > i have a file in which data is stored as > > HELIX 4 4 VAL 74 LEU 84 1 11 > CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4 > ORIGX1 1.000000 0.000000 0.000000 0.00000 > ORIGX2 0.000000 1.000000 0.000000 0.00000 > ORIGX3 0.000000 0.000000 1.000000 0.00000 > SCALE1 0.030285 0.000000 0.000000 0.00000 > SCALE2 0.000000 0.029630 0.000000 0.00000 > SCALE3 0.000000 0.000000 0.013215 0.00000 > ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N > ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C > ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C > ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O > ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C > ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C > ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C > ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C > ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N > ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C > ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C > > > > the fields of records having ATOM as 1st field are as follows: > > COLUMNS DATA TYPE FIELD DEFINITION > ------------------------------------------------------------------- > -------------- > 1 - 6 Record name "ATOM " > > 7 - 11 Integer serial Atom serial number. > > 13 - 16 Atom name Atom name. > > 17 Character altLoc Alternate location > indicator. > 18 - 20 Residue name resName Residue name. > > 22 Character chainID Chain identifier. > > 23 - 26 Integer resSeq Residue sequence number. > > 27 AChar iCode Code for insertion of > residues. > 31 - 38 Real(8.3) x Orthogonal > coordinates for X in > Angstroms. > > 39 - 46 Real(8.3) y Orthogonal > coordinates for Y in > Angstroms. > > 47 - 54 Real(8.3) z Orthogonal > coordinates for Z in > Angstroms. > > 55 - 60 Real(6.2) occupancy Occupancy. > > 61 - 66 Real(6.2) tempFactor Temperature factor. > > 73 - 76 LString(4) segID Segment identifier, > left-justified. > > 77 - 78 LString(2) element Element symbol, right- > justified. > 79 - 80 LString(2) charge Charge on the atom. > > > > I have to get the x,y,z coordinates of records whose atom name is > 'CA'(highlighted as blue). > > I wrote a code but its giving many errors.. > > The code is: > > > > #!usr/bin/perl > use warnings; > > $filename = "1a32.txt"; > chomp $filename;
The above line is useless, perldoc -f chomp
> open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; > @file= <FILEHANDLE>; > close (FILEHANDLE); > > $a= "ATOM"; > $c= "CA"; > > foreach $line(@file) > { > if(my $line =~ /^/$a/\s* > (\s*\d+) > \s*/$c/\s* > \d* > \w+ > \s > \w > (\s*\d+) > \w* > (\s*\d*) > (\s*\d*) > (\s*\d*) > (\s*\d*) > (\s*\d*) > (\w*\s*) > (\s*\w*) > (\s*\w*)/)
Youch, that is way to long of a regular expression [ atleast for me ], you may consider a shorter nested version such as my @fields =~ /([\w+\s+])/g. In any case I think split would work the best here [ my @fields = split /\s/,$line ], since your fields are locked into place. In general you PAD data, untill you get it all uniformed such as yours. Below is some simple code that should help you on your way, feel free to modify at will.
> > { > my $x= substr($line,30,8); > my $y= substr($line,38,8); > my $z= substr($line,46,8); > > print "$x\t$y\t$z\n"; > } > } > > #------------------------------------------------- > > > > The errors that i'm getting are: > > Scalar found where operator expected at two.pl line 15, near "/^/$a" > (Missing operator before $a?) > Unrecognized escape \d passed through at two.pl line 15. > Unrecognized escape \s passed through at two.pl line 15. > Unrecognized escape \w passed through at two.pl line 17. > Unrecognized escape \s passed through at two.pl line 17. > Unrecognized escape \w passed through at two.pl line 17. > Unrecognized escape \s passed through at two.pl line 17. > Backslash found where operator expected at two.pl line 22, near > "(\s*\" (Might be a runaway multi-line ** string starting on line 17) > (Missing operator before \?) > Unquoted string "d" may clash with future reserved word at two.pl > line 22. > Backslash found where operator expected at two.pl line 23, near ") > \" > (Missing operator before \?) > Unquoted string "w" may clash with future reserved word at two.pl > line 23. > Unrecognized escape \s passed through at two.pl line 24. > Backslash found where operator expected at two.pl line 25, near > "(\s*\" (Might be a runaway multi-line ** string starting on line 24) > (Missing operator before \?) > Unquoted string "d" may clash with future reserved word at two.pl > line 25. > Unrecognized escape \s passed through at two.pl line 26. > Backslash found where operator expected at two.pl line 27, near > "(\s*\" (Might be a runaway multi-line ** string starting on line 26) > (Missing operator before \?) > Unquoted string "d" may clash with future reserved word at two.pl > line 27. > Unrecognized escape \w passed through at two.pl line 28. > Backslash found where operator expected at two.pl line 29, near > "(\w*\" (Might be a runaway multi-line ** string starting on line 28) > (Missing operator before \?) > Unrecognized escape \w passed through at two.pl line 29. > syntax error at two.pl line 15, near "/^/$a" > Substitution replacement not terminated at two.pl line 31. > > Please help me.. >
#!usr/bin/perl use warnings; use strict; my $filename = "1a32.txt"; my $Atom='CA';
open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!"; @file= <FILEHANDLE>; close (FILEHANDLE);
foreach my $line ( @file ){
my @fields = split /\s/,$line; print "X: $fields[-4] Y: $fields[-3] Z: $fields[-2]\n" if uc $fields[2] eq '$Atom;
}
HTH, Mark G.
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>