Hi Anyuan,

I don't believe it is possible to retain the RefSeq ID in this case when 
using the Table Browser.  However, I think that Galaxy has this 
capacity, either by doing the intersection from scratch using their 
tools, or by joining your MAF with your custom track based on the genome 
coordinates.

Galaxy has screencasts:
http://galaxy.psu.edu/screencasts.html

and a wiki:
http://g2.trac.bx.psu.edu/

This screencast might be particularly helpful:
http://screencast.g2.bx.psu.edu/galaxy/MAF_manipulation/

If you have more questions about how to accomplish your task using 
Galaxy, you can contact them at [EMAIL PROTECTED]

Good luck with your research.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


a lot of  at [EMAIL PROTECTED] for help.

On 12/10/08 15:17, Anyuan Guo wrote:
> Dear Brooke,
>    Thanks very much. I learned a lot about creating custom track in your 
> email. I can download a ~76Mb compressed file when I follow your 
> instruction to create a custom track for upstream 1000 bp of RefseqGene 
> and intersect with 17-way Cons. But I found the file format is not begin 
> with Refseq ID (NM_xxxx). The following is the first 4 lines of the file.
> ##maf version=1
> a score=-55252.000000
> s hg18.chr1                   14754  99 + 247249719 
> CTGTGGGTCGGAGCCGGAGCGTCAGAGC---------CACCCACGACCACCGGCACGCC----CCCACCACA-GGGCAGCGTGG-TGTTGAGACAAC------A
>  
> 
> 
>      In fact, I need a file begin with Refseq ID, the downloaded maf 
> file 
> h(ttp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz)
>  
> exactly match my requirement. But because some refseq sequences were 
> updated, the downloaded file is out of date.
>    The following is the first 4 lines of the downloaded file, which I need.
> ##maf version=1 scoring=zero
> a score=0.000000
> s NM_198943 0 1000 + 1000 
> GCATTTTAAACCCAAGTG----AAATCTCCTAGG----------CCCTTCATGCCACACTCA-----TCCATCCCTACCTAC--TTGTGTTGCAACCAAGGGCCCCAC
>  
> 
> 
> How can I get the up-to-date version of this download file?
> Thanks.
> 
> Anyuan
> 
> 
> Brooke Rhead wrote:
>> Hello Anyuan,
>>
>> The reason that the sequence is different via the download file and 
>> the Table Browser is that the sequence associated with NM_014223 at 
>> RefSeq has changed since the download file was made.  The items in the 
>> RefSeq Genes track are updated daily; the download files are generally 
>> only made once.
>>
>> You can see the revision history for any GenBank accession at NCBI:
>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=NM_014223
>>
>> The download file was last updated on 7-7-2007.  I tried blatting the 
>> NM_014223 sequence from the "Jun 3 2007 1:10 PM" update to the hg18 
>> assembly, and the sequence aligned starting at the genomic coordinate 
>> chr1:40,929,952.  The upstream sequence from the file you downloaded 
>> corresponds to the 1,000 bases upstream of that base.
>>
>> You can get an up-to-date version of the download file by creating 
>> yourself with the Table Browser.  First, make a custom track of the 
>> upstream regions of RefSeq Genes.  If you select the RefSeq Genes 
>> track in the Table Browser and choose "output format: custom track", 
>> you will be presented with an option to create one BED record per 
>> region that is "Upstream by ___ bases".  Enter 1,000 or 2,000 in this 
>> box and hit "get custom track in genome browser".  You should see a 
>> new custom track containing blocks representing regions upstream of 
>> all RefSeq Genes.
>>
>> Now you can intersect your new custom track with the multiz alignment 
>> in the Conservation track to get only the upstream regions.  To do 
>> this step, select the 17-way (or 28-way) Conservation track in the 
>> Table Browser.  Select the table 'multiz17way' and region: genome.  
>> Hit the "intersection: create" button and select your custom track.  
>> Choose the option for "Base-pair-wise intersection (AND) of 17-Way 
>> Cons and upstream regions from refGene" and hit submit.  Back on the 
>> main Table Browser page, select "output format: MAF".  The size of the 
>> file you will be creating is quite large (76 Mb compressed for 1,000 
>> base regions).  I suggest entering a name for the file and selecting 
>> the option to get a gzip compressed version of it.  Hit "get output".  
>> You should end up with a MAF file that contains only the regions 
>> upstream of RefSeq Genes.
>>
>> You may also be interested in the tools for working with MAF 
>> alignments at Galaxy: http://galaxy.psu.edu/ .  Galaxy is run by our 
>> collaborators at Penn State and extends the functionality of the Table 
>> Browser.  For instance, there is a tool to filter any undesired 
>> species from a MAF file, leaving only the species of interest to you.
>>
>> I hope this is helpful.  If you have further questions, please feel 
>> free to contact us again at [EMAIL PROTECTED]  If you have 
>> questions specific to Galaxy, their helpdesk email address is 
>> [EMAIL PROTECTED]
>>
>> -- 
>> Brooke Rhead
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>> Subject: question or bug about UCSC genome browser sequence
>> From: Anyuan Guo <[EMAIL PROTECTED]>
>> Date: Mon, 17 Nov 2008 10:54:21 -0800
>> To: [email protected]
>>
>>
>> Dear author,
>>     Thanks for you providing the wonderful database and website of UCSC
>> genome browser.
>>     I have question about the sequence in it.
>>        I downloaded the human upstream 1000bp multiz alignment file from
>> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz
>>  
>>
>>     When I check my sequence id  NM_014223.
>>     I can find the upstream 1000 bp sequence of this refseq gene in the
>> downloaded multiz alignment file.
>>     I also can search this id in genome browser and get the upstream
>> 1000 bp using the "DNA" or "Tables" menu at the top of genome browser 
>> page.
>>     But I find these two upstream 1000 bp sequence are totally
>> different. I think the one using genome browser is right.
>>     But I am not just need the upstream 1000bp sequence, I need the
>> alignment with mouse sequence.
>>
>>     Can I just get the sequence alignment between human and mouse for
>> all the refseq gene and the upstream 1000 or 2000 of these genes? Where
>> can I find it?
>>     I think those ortholog gene alignment (including upstream
>> regulatory sequence alignment) between two popular genome will be very
>> useful.
>>
>> thanks.
>>
>> Anyuan
>>
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to