Hi, Jia,
I am not sure whether this is what you want. ftp://ftp.ncbi.nlm.nih.gov/genomes/Apis_mellifera/mapview/org_transcript.gff.gz Good luck! Wenwu Cui PhD Sr. Bioinformatics Scientist NEXTBIO > 3 Results Way Cupertino CA 95014 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jennifer Jackson Sent: Monday, August 24, 2009 1:37 PM To: Jia Zeng Cc: genome Subject: Re: [Genome] Question for honey bee genome Hello again, One of our scientific developers has some very good advice for your situation. Here are the comments: Instead of using scaffold sequences, use that we call group (we treat these as "chromosome" for browser purposes). Data can be ftp'd from this location: http://hgdownload.cse.ucsc.edu/goldenPath/apiMel2/bigZips/GroupFa.zip There are some compications concerning assembly versions, which could potentially cause the coordinate differences you mentioned last week (assuming the work around did not solve all cases). The "apiMel2" browser is for the Jan.2005 (Baylor Amel_2.0) assembly. There was a later assembly (Amel_3.0, May 2005), and we have the sequence for that available on our Downloads server, named apiMel3, but you will notice that only sequence is available (no browser) . If that is not a concern, annotation from the older assembly can be converted to the new assembly using liftOver for flat file use. Meanwhile, Baylor has released Amel_4.0, so NCBI probably is using a more recent version than ours. Coordinates from NCBI can't be compared with our coordinates, nor should they be fed to the Table Browser. Aligning to our 2.0 sequence, and getting 2.0 gene sequences from the TB, will probably give sequences that are mostly in accord with gene sequences derived from a later version, but not completely. Because of this, we do not recommend using the TB for data manipulation between these assemblies. i Overalll, for the most current genomic, downloading the assembly sequence on which NCBI coordinates are based and either blat against it or just use NCBI's annotation coordinates may be the best solution. Some tools from our code tree that will work (regardless of assembly source): faFrag (or faToTwoBit on the whole sequence, and twoBitToFa -seqList=coordFile to fetch regions) to extract sequence from the start and end coordinates. or featureBits -chromSize with region specifiers Links: http://genome.ucsc.edu/FAQ/FAQdownloads#download27 http://genomewiki.cse.ucsc.edu/index.php/Kent_source_utilities Thanks, Jennifer ------------------------------------------------ Jennifer Jackson UCSC Genome Bioinformatics Group ----- "Jia Zeng" <[email protected]> wrote: > From: "Jia Zeng" <[email protected]> > To: "genome" <[email protected]> > Sent: Monday, August 24, 2009 8:12:15 AM GMT -08:00 US/Canada Pacific > Subject: [Genome] Question for honey bee genome > > To whom it may concern: > > I am doing some analysis in honey bee genome and I come across some > problem that need your help. What I do is to run blat in my local > server and get the pls format output file which record the coordinates > of the alignment block . But the problem is the database file I > downloaded is the scaffold sequence for honey bee genome in UCSC table > browser which under the scaffold track. After searching against this > database, I compared the output to the result I searched in web-base > blat, it turned out that the output for a same refseq ID has the same > hit but different coordinates, I think the difference is due to > different coordinates system(one based on linkage group and the other > based on a certain scaffold.) In this case I can't extract the right > sequence based on the scaffold coordinates. So, I was wondering if > there is a resource like the sequence for a whole linkage group I can > use as the database? Thank you ! > > Jia > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
