On 09/30/2010 10:53 AM, Ivan Gregoretti wrote: > Hello everybody, > > How do you perform a BLAST alignment of 1 sequence within R? > Any pointer would be appreciated. > > The sequence in question is about 40kb and the pertinent genome is mm9.
Hi Ivan One possibility comes from http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html where one could load XML library(XML) then formulate a query baseUrl <- "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi" query <- paste("QUERY=555&DATABASE=nr&HITLIST_SIZE=10&FILTER=L", "&EXPECT=10&PROGRAM=blastn", sep="") url0 <- sprintf("%s?%s&CMD=Put", baseUrl, query) Then submit the query and extract the return result id (RID) using XML and XPATH queries (see http://www.w3.org/TR/xpath/, especially section 2.5) post <- htmlTreeParse(url0, useInternalNodes=TRUE) rid <- xpathApply(post, '//inp...@name="RID"]...@id="rid"]', xmlAttrs)[[1]][["value"]] and then finally retrieve the result url1 <- sprintf("%s?RID=%s&FORMAT_TYPE=XML&CMD=Get", baseUrl, rid) result <- xmlTreeParse(url1, useInternalNodes=TRUE) what to do now depends on the info you want out, e.g., qseq <- xpathApply(result, "//Hsp_qseq", xmlValue) hseq <- xpathApply(result, "//Hsp_hseq", xmlValue) midline <- xpathApply(result, "//Hsp_midline", xmlValue) for (i in 1:5) cat(qseq[[i]], hseq[[i]], midline[[i]], sep="\n") Might be fun (though a bit of work, to cast the alignments onto common reference coordinates) to use this in the new Biostrings::DNAMultipleAlignment constructor. Martin > > Thank you, > > Ivan > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
