Here is the the written instruction as i managed to get it from my professor, the graphs and data are attached:
The graph below shows an example of the expected outcome of this course work. You may procude a better one. The graph for analysing the motifs of a set of peptides is designed this way • the graph is composed of columns of coloured rectangles • a column corresponding to a residue from “N4” to “C4”. Note that eight residues are denoted by “N4”, “N3”, “N2”, “N1”, “C1”, “C2”, “C3”, “C4”. “N4” means the 4th flanking residue of a cleavage site on the N-terminal side and “C3” means the 3rd flanking residue of a cleavage site on the C-terminal side. The cleavage occurs between “N1” and “C1”. • there are 20 rectangles in each column corresponding to 20 amino acids. A rectangular of an amino acid has a larger height if the corresponding amino acid has a larger frequency to occur at the residue, for instance, the rectangular of “S” in the first column for the cleaved peptides. • a letter of an amino acid is printed within a rectangular. Its font size depends on the frequency of the amino acid in a residue. In your package, you need to have the following functions 1. set a colour map using the following or your own design • colmap<-c("#FFFFFF", "#FFFFCC", "#FFFF99", "#FFFF66", "#FFFF33", "#FFFF00", "#FFCCFF", "#FFCCCC", "#FFCC99", "#FFCC66", "#FFCC33", "#FFCC00", "#FF99FF", "#FF99CC", "#FF9999", "#FF9966", "#FF9933", "#FF9900", "#FF33FF", "#FF33CC") 2. define a set of amino acids using string or other format if you want • amino.acid<-"ACDEFGHIKLMNPQRSTVWY" 3. read in the given peptide data (“hiv.dat”) using read.table(‘‘../data/hiv.dat’’,header=TRUE) • The data I sent to you should not be saved in the same directory where you save your R code! • The data is composed of two parts, cleaved (denoted by “cleaved”) and non cleaved (denoted by “noncleaved”). The first five lines of the data are shown below Peptide Label TQIMFETF cleaved GQVNYEEF cleaved KVFGRCEL noncleaved VFGRCELA noncleaved • to access to the ith peptide, you can use X$Peptide[i] • to access to the ith label, you can use X$Label[i] 4. detect the number of cleaved peptides and the number of non-cleaved peptides using • nrow(X) 5. define two matrices with initialised entries, one for positive peptides and one for neg- ative peptides • matrix(0,AA,mer),where AA is the number of amino acids, and mer is the number of residues detected from data using the nchar function • both matrices have the same size, the number of rows being equal to the number of amino acids and the number of columns being equal to the number of residues in peptides • name the columns of these two matrices using – c("N4","N3","N2","N1","C1","C2","C3","C4"), 6. use one three-loop structure to detect the frequency of amino acids in cleaved peptides and one three-loop structure to detect the frequency of amino acids in non-cleaved peptides. They should not be mixed in one three-loop structure. The best way to handle this is to use a function. The three-loop structure is exampled as below for(i in 1:num)#scanning data for all peptides, where num means the number of peptides { for(j in 1:mer)#scanning all residues in a peptide { for(k in 1:AA)#scanning 20 amino acids { #actions } } } 7. make sure that each frequency matrix needs to be converted to a percentage, i.e. each entry in the matrix is divided by the number of cleaved or non-cleaved peptides and multiplied by 100. This converted frequency is named as the normalised frequency. 8. detect the maximum height of the normalised frequency each residue in cleaved or non-cleaved peptides using height<-rep(0,mer) for(j in 1:mer) height[j]<-sum(round(X.frequency[,j])) max.height<-max(height) • Note that the height of each column in a graph (see the graph on 3) corresponds to the summation of 20 frequencies of 20 amino acids for a residue. 9. draw a blank plot using the maximum height • plot(c(0,10*mer),c(0,max.height),col="white", • • •) • in this blank plot, you can add graphics as discussed below 10. determine the x coordinate, but it is recommended to use i*10 as the x-coordinate where i indexes the residues. The x-coordinate represents columns in the graph shown in 3. If there are 8 residues in peptides, there are 8 columns. 11. determine the y coordinate, which is cumulative (see next item below). The y- coordinate represents rows in the graph shown in 3. There are always 20 rows for 20 amino acids. Note that the rows cannot be aligned because the frequency of an amino acid in a residue varies. 12. draw a rectangular based on the frequency of each residue and each amino acid • rect(x,y,x+10,y+round(X.frequency[k,j]),col=colmap[k]), where k indi- cates an amino acid and j indicates a residue • after drawing this rectangular, the y-coordinate “y” should be increased by round(X.frequency[k,j]) • after one column is drawn for one residue, the x-coordinate “x” should be in- creased by 10 13. plot a text at the corresponding position using • text((x+5),(y+round(X.frequency[k,j])/2),substr(amino.acid,k,k)) 14. place two drawings in one plot using the par function http://n4.nabble.com/file/n1457645/cleaved.jpg cleaved.jpg http://n4.nabble.com/file/n1457645/noncleaved.jpg noncleaved.jpg http://n4.nabble.com/file/n1457645/hiv.dat hiv.dat -- View this message in context: http://n4.nabble.com/More-than-on-loop-tp1015851p1457645.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.