Thanks Richard. The script looks handy and at minimum may help me deal with searching for my identifiers in more than one database.
cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician NERC Environmental Bioinformatics Centre http://nebc.nerc.ac.uk tel: 01491 69 2705 Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB ************************* ________________________________________ From: Richard Rothery [[email protected]] Sent: 18 June 2010 15:54 To: Tiwari, Bela Cc: [email protected] Subject: Re: [EMBOSS] getting organism from accession I have a perl script written by Craig Knox of the University of Alberta Bioinformatics help desk that does this. I have attached it FYI. It is slow, but gets the job done. Output can be fed to gnumeric etc. Richard On Fri, 2010-06-18 at 13:07 +0100, Tiwari, Bela wrote: > Dear all, > > I have a set of accession numbers and I want to retrieve the organism that > the sequence is associated with - i.e. the content of the OS line in an embl > file. I don't need the taxonomic id, and I don't need to start traversing > taxonomy trees. I want to do this by accessing remote databases (via srs, as > configured in my emboss.defaults file), rather than indexing databases > locally. So the output I want would be a text mapping like: > > accession : species > > where species is taken from the OS line of a database entry. > > The closest I've made it to using Emboss is to get the gff output file > containing feature information using a command along the lines of: > > seqret -feature embl:XXXX -oufo2 myfeat.txt > > (embl is a database I can search using srs as configured in my > emboss.defaults file.) The first non-hashed line in the file myfeat.txt > contains the term > > "organism="Whateverus thingus" > > so I could parse that out. However, this file still contains a lot of extra > (unwanted) information and requires parsing. > > Does anyone know if I'm missing something obvious in Emboss that I could use > for this? > > (I have tried the BioPerl route to get this info from the NCBI, and apart > from being unwieldly, I'm managing to get the wrong organism returned for the > type of identifer I have. No, I haven't spent time tracking down the problem > - frankly, I'd rather resove it using Emboss and/or srs calls.) > > If there isn't anything that will do the job in Emboss at the moment, is > there any chance I can put in a development request for an extra flag for > seqret, or an extra utility tool that might accomplish this task? > > cheers, > > Bela > > ************************* > Dr. Bela Tiwari > Lead Bioinformatician > NERC Environmental > Bioinformatics Centre > http://nebc.nerc.ac.uk > tel: 01491 69 2705 > > Centre for Ecology and Hydrology > Maclean Bldg, Benson Lane > Crowmarsh Gifford > Wallingford, England > OX10 8BB > ************************* -- This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
