Hello again,

One of our scientific developers has some very good advice for your situation. 
Here are the comments:

Instead of using scaffold sequences, use that we call group (we treat these as 
"chromosome" for browser purposes).  Data can be ftp'd from this location:

http://hgdownload.cse.ucsc.edu/goldenPath/apiMel2/bigZips/GroupFa.zip

There are some compications concerning assembly versions, which could 
potentially cause the coordinate differences you mentioned last week (assuming 
the work around did not solve all cases).  The "apiMel2" browser is for the 
Jan.2005 (Baylor Amel_2.0) assembly.  There was a later assembly (Amel_3.0, May 
2005), and we have the sequence for that available on our Downloads server, 
named apiMel3, but you will notice that only sequence is available (no browser) 
. If that is not a concern, annotation from the older assembly can be converted 
to the new assembly using liftOver for flat file use.

Meanwhile, Baylor has released Amel_4.0, so NCBI probably is using a more 
recent version than ours.  Coordinates from NCBI can't be compared with our 
coordinates, nor should they be fed to the Table Browser.  Aligning to our 2.0 
sequence, and getting 2.0 gene sequences from the TB, will probably give 
sequences that are mostly in accord with gene sequences derived from a later 
version, but not completely.  Because of this, we do not recommend using the TB 
for data manipulation between these assemblies.
i
Overalll, for the most current genomic, downloading the assembly sequence on 
which NCBI coordinates are based and either blat against it or just use NCBI's 
annotation coordinates may be the best solution.

Some tools from our code tree that will work (regardless of assembly source):

faFrag (or faToTwoBit on the whole sequence, and twoBitToFa -seqList=coordFile 
to fetch regions) to extract sequence from the start and end coordinates. 
or
featureBits -chromSize with region specifiers 

Links:
http://genome.ucsc.edu/FAQ/FAQdownloads#download27
http://genomewiki.cse.ucsc.edu/index.php/Kent_source_utilities

Thanks, Jennifer

------------------------------------------------ 
Jennifer Jackson 
UCSC Genome Bioinformatics Group 

----- "Jia Zeng" <[email protected]> wrote:

> From: "Jia Zeng" <[email protected]>
> To: "genome" <[email protected]>
> Sent: Monday, August 24, 2009 8:12:15 AM GMT -08:00 US/Canada Pacific
> Subject: [Genome] Question for honey bee genome
>
> To whom it may concern:
> 
> I  am doing some analysis in honey bee genome and I come across some
> problem that need your help. What I do is to run blat in my local
> server and get the pls format output file which record the coordinates
> of the alignment block . But the problem is the database file I
> downloaded is the scaffold sequence for honey bee genome in UCSC table
> browser which under the scaffold track. After searching against this
> database, I compared the output to the result I  searched in web-base
> blat, it turned out that the output for a same refseq ID has the same
> hit but different coordinates, I think the difference is due to
> different coordinates system(one based on linkage group and the other
> based on a certain scaffold.) In this case I can't extract the right
> sequence based on the scaffold coordinates.  So, I was wondering if
> there is a resource like the sequence for a whole linkage group I can
> use as the database? Thank you !
> 
> Jia   
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to