Re: extracting coordinates

mgoland Mon, 09 May 2005 13:21:08 -0700


----- Original Message -----
From: Aditi Gupta <[EMAIL PROTECTED]>
Date: Monday, May 9, 2005 12:52 pm
Subject: Re: extracting coordinates


> hi, 
Hello,

> the fields can not be splitted using /s because some fields have 
> common 
> boundaries, i.e. some fields are from column 31-38 and the next 
> field starts 
> from 39.. As in
>  COLUMNS DATA TYPE FIELD DEFINITION
That can very well be the case, but y examining the data [ in all of your 
examples ] spliting on white space will work. If not, can you tell us a case 
where it won't work ??

Mark G.

> -------------------------------------------------------------------
> --------------
> 1 - 6 Record name "ATOM "
> 
> 7 - 11 Integer serial Atom serial number.
> 
> these are the 1st two fields.
> On 5/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: 
> > 
> > 
> > 
> > ----- Original Message -----
> > From: Aditi Gupta <[EMAIL PROTECTED]>
> > Date: Monday, May 9, 2005 11:41 am
> > Subject: extracting coordinates
> > 
> > > Hi everyone,
> > Hello Aditi,
> > 
> > >
> > > That code is working...
> > > But my specific problem is as follows:
> > >
> > > i have a file in which data is stored as
> > >
> > > HELIX 4 4 VAL 74 LEU 84 1 11
> > > CRYST1 33.020 33.750 75.670 90.00 90.00 90.00 P 21 21 21 4
> > > ORIGX1 1.000000 0.000000 0.000000 0.00000
> > > ORIGX2 0.000000 1.000000 0.000000 0.00000
> > > ORIGX3 0.000000 0.000000 1.000000 0.00000
> > > SCALE1 0.030285 0.000000 0.000000 0.00000
> > > SCALE2 0.000000 0.029630 0.000000 0.00000
> > > SCALE3 0.000000 0.000000 0.013215 0.00000
> > > ATOM 1 N LEU 2 -10.586 -14.055 54.397 1.00 49.37 N
> > > ATOM 2 CA LEU 2 -9.711 -13.341 53.419 1.00 48.40 C
> > > ATOM 3 C LEU 2 -10.401 -12.068 52.928 1.00 46.56 C
> > > ATOM 4 O LEU 2 -11.440 -12.138 52.267 1.00 47.05 O
> > > ATOM 5 CB LEU 2 -9.417 -14.253 52.223 1.00 51.90 C
> > > ATOM 6 CG LEU 2 -7.974 -14.441 51.748 1.00 54.45 C
> > > ATOM 7 CD1 LEU 2 -7.365 -13.109 51.342 1.00 53.43 C
> > > ATOM 8 CD2 LEU 2 -7.160 -15.095 52.852 1.00 55.22 C
> > > ATOM 9 N THR 3 -9.833 -10.909 53.259 1.00 42.49 N
> > > ATOM 10 CA THR 3 -10.405 -9.634 52.826 1.00 40.93 C
> > > ATOM 11 C THR 3 -10.060 -9.403 51.362 1.00 41.24 C
> > >
> > >
> > >
> > > the fields of records having ATOM as 1st field are as follows:
> > >
> > > COLUMNS DATA TYPE FIELD DEFINITION
> > > ---------------------------------------------------------------
> ----
> > > --------------
> > > 1 - 6 Record name "ATOM "
> > >
> > > 7 - 11 Integer serial Atom serial number.
> > >
> > > 13 - 16 Atom name Atom name.
> > >
> > > 17 Character altLoc Alternate location
> > > indicator.
> > > 18 - 20 Residue name resName Residue name.
> > >
> > > 22 Character chainID Chain identifier.
> > >
> > > 23 - 26 Integer resSeq Residue sequence number.
> > >
> > > 27 AChar iCode Code for insertion of
> > > residues.
> > > 31 - 38 Real(8.3) x Orthogonal
> > > coordinates for X in
> > > Angstroms.
> > >
> > > 39 - 46 Real(8.3) y Orthogonal
> > > coordinates for Y in
> > > Angstroms.
> > >
> > > 47 - 54 Real(8.3) z Orthogonal
> > > coordinates for Z in
> > > Angstroms.
> > >
> > > 55 - 60 Real(6.2) occupancy Occupancy.
> > >
> > > 61 - 66 Real(6.2) tempFactor Temperature factor.
> > >
> > > 73 - 76 LString(4) segID Segment identifier,
> > > left-justified.
> > >
> > > 77 - 78 LString(2) element Element symbol, right-
> > > justified.
> > > 79 - 80 LString(2) charge Charge on the atom.
> > >
> > >
> > >
> > > I have to get the x,y,z coordinates of records whose atom name is
> > > 'CA'(highlighted as blue).
> > >
> > > I wrote a code but its giving many errors..
> > >
> > > The code is:
> > >
> > >
> > >
> > > #!usr/bin/perl
> > > use warnings;
> > >
> > > $filename = "1a32.txt";
> > > chomp $filename;
> > 
> > The above line is useless, perldoc -f chomp
> > 
> > > open (FILEHANDLE, "$filename") or die "couldn't open 
> $filename:$!";> > @file= <FILEHANDLE>;
> > > close (FILEHANDLE);
> > >
> > > $a= "ATOM";
> > > $c= "CA";
> > >
> > > foreach $line(@file)
> > > {
> > > if(my $line =~ /^/$a/\s*
> > > (\s*\d+)
> > > \s*/$c/\s*
> > > \d*
> > > \w+
> > > \s
> > > \w
> > > (\s*\d+)
> > > \w*
> > > (\s*\d*)
> > > (\s*\d*)
> > > (\s*\d*)
> > > (\s*\d*)
> > > (\s*\d*)
> > > (\w*\s*)
> > > (\s*\w*)
> > > (\s*\w*)/)
> > 
> > Youch, that is way to long of a regular expression [ atleast for 
> me ], you 
> > may consider a shorter nested version such as my @fields =~ 
> /([\w+\s+])/g. 
> > In any case I think split would work the best here [ my @fields 
> = split 
> > /\s/,$line ], since your fields are locked into place. In 
> general you PAD 
> > data, untill you get it all uniformed such as yours. Below is 
> some simple 
> > code that should help you on your way, feel free to modify at will.
> > 
> > >
> > > {
> > > my $x= substr($line,30,8);
> > > my $y= substr($line,38,8);
> > > my $z= substr($line,46,8);
> > >
> > > print "$x\t$y\t$z\n";
> > > }
> > > }
> > >
> > > #-------------------------------------------------
> > >
> > >
> > >
> > > The errors that i'm getting are:
> > >
> > > Scalar found where operator expected at two.pl line 15, near 
> "/^/$a"> > (Missing operator before $a?)
> > > Unrecognized escape \d passed through at two.pl line 15.
> > > Unrecognized escape \s passed through at two.pl line 15.
> > > Unrecognized escape \w passed through at two.pl line 17.
> > > Unrecognized escape \s passed through at two.pl line 17.
> > > Unrecognized escape \w passed through at two.pl line 17.
> > > Unrecognized escape \s passed through at two.pl line 17.
> > > Backslash found where operator expected at two.pl line 22, near
> > > "(\s*\" (Might be a runaway multi-line ** string starting on 
> line 17)
> > > (Missing operator before \?)
> > > Unquoted string "d" may clash with future reserved word at two.pl
> > > line 22.
> > > Backslash found where operator expected at two.pl line 23, 
> near ")
> > > \"
> > > (Missing operator before \?)
> > > Unquoted string "w" may clash with future reserved word at two.pl
> > > line 23.
> > > Unrecognized escape \s passed through at two.pl line 24.
> > > Backslash found where operator expected at two.pl line 25, near
> > > "(\s*\" (Might be a runaway multi-line ** string starting on 
> line 24)
> > > (Missing operator before \?)
> > > Unquoted string "d" may clash with future reserved word at two.pl
> > > line 25.
> > > Unrecognized escape \s passed through at two.pl line 26.
> > > Backslash found where operator expected at two.pl line 27, near
> > > "(\s*\" (Might be a runaway multi-line ** string starting on 
> line 26)
> > > (Missing operator before \?)
> > > Unquoted string "d" may clash with future reserved word at two.pl
> > > line 27.
> > > Unrecognized escape \w passed through at two.pl line 28.
> > > Backslash found where operator expected at two.pl line 29, near
> > > "(\w*\" (Might be a runaway multi-line ** string starting on 
> line 28)
> > > (Missing operator before \?)
> > > Unrecognized escape \w passed through at two.pl line 29.
> > > syntax error at two.pl line 15, near "/^/$a"
> > > Substitution replacement not terminated at two.pl line 31.
> > >
> > > Please help me..
> > >
> > 
> > #!usr/bin/perl
> > use warnings;
> > use strict;
> > my $filename = "1a32.txt";
> > my $Atom='CA';
> > 
> > open (FILEHANDLE, "$filename") or die "couldn't open $filename:$!";
> > @file= <FILEHANDLE>;
> > close (FILEHANDLE);
> > 
> > foreach my $line ( @file ){
> > 
> > my @fields = split /\s/,$line;
> > print "X: $fields[-4] Y: $fields[-3] Z: $fields[-2]\n" if uc 
> $fields[2] eq 
> > '$Atom;
> > 
> > }
> > 
> > HTH,
> > Mark G.
> > 
> >
> 


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: extracting coordinates

Reply via email to