Hi again, Anyuan,

One of our developers created updated versions of the hg18 17-way 
upstream MAF files for you.  They are located here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/

The time stamps on the files are:

  upstream1000.maf.gz     12-Dec-2008 13:51   52M
  upstream2000.maf.gz     12-Dec-2008 13:51  124M
  upstream5000.maf.gz     12-Dec-2008 13:52  272M

I hope this helps!

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 12/10/08 13:12, Brooke Rhead wrote:
> Hi Anyuan,
> 
> I don't believe it is possible to retain the RefSeq ID in this case when 
> using the Table Browser.  However, I think that Galaxy has this 
> capacity, either by doing the intersection from scratch using their 
> tools, or by joining your MAF with your custom track based on the genome 
> coordinates.
> 
> Galaxy has screencasts:
> http://galaxy.psu.edu/screencasts.html
> 
> and a wiki:
> http://g2.trac.bx.psu.edu/
> 
> This screencast might be particularly helpful:
> http://screencast.g2.bx.psu.edu/galaxy/MAF_manipulation/
> 
> If you have more questions about how to accomplish your task using 
> Galaxy, you can contact them at [email protected].
> 
> Good luck with your research.
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> 
> a lot of  at [email protected] for help.
> 
> On 12/10/08 15:17, Anyuan Guo wrote:
>> Dear Brooke,
>>    Thanks very much. I learned a lot about creating custom track in your 
>> email. I can download a ~76Mb compressed file when I follow your 
>> instruction to create a custom track for upstream 1000 bp of RefseqGene 
>> and intersect with 17-way Cons. But I found the file format is not begin 
>> with Refseq ID (NM_xxxx). The following is the first 4 lines of the file.
>> ##maf version=1
>> a score=-55252.000000
>> s hg18.chr1                   14754  99 + 247249719 
>> CTGTGGGTCGGAGCCGGAGCGTCAGAGC---------CACCCACGACCACCGGCACGCC----CCCACCACA-GGGCAGCGTGG-TGTTGAGACAAC------A
>>  
>>
>>
>>      In fact, I need a file begin with Refseq ID, the downloaded maf 
>> file 
>> h(ttp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz)
>>  
>> exactly match my requirement. But because some refseq sequences were 
>> updated, the downloaded file is out of date.
>>    The following is the first 4 lines of the downloaded file, which I need.
>> ##maf version=1 scoring=zero
>> a score=0.000000
>> s NM_198943 0 1000 + 1000 
>> GCATTTTAAACCCAAGTG----AAATCTCCTAGG----------CCCTTCATGCCACACTCA-----TCCATCCCTACCTAC--TTGTGTTGCAACCAAGGGCCCCAC
>>  
>>
>>
>> How can I get the up-to-date version of this download file?
>> Thanks.
>>
>> Anyuan
>>
>>
>> Brooke Rhead wrote:
>>> Hello Anyuan,
>>>
>>> The reason that the sequence is different via the download file and 
>>> the Table Browser is that the sequence associated with NM_014223 at 
>>> RefSeq has changed since the download file was made.  The items in the 
>>> RefSeq Genes track are updated daily; the download files are generally 
>>> only made once.
>>>
>>> You can see the revision history for any GenBank accession at NCBI:
>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=NM_014223
>>>
>>> The download file was last updated on 7-7-2007.  I tried blatting the 
>>> NM_014223 sequence from the "Jun 3 2007 1:10 PM" update to the hg18 
>>> assembly, and the sequence aligned starting at the genomic coordinate 
>>> chr1:40,929,952.  The upstream sequence from the file you downloaded 
>>> corresponds to the 1,000 bases upstream of that base.
>>>
>>> You can get an up-to-date version of the download file by creating 
>>> yourself with the Table Browser.  First, make a custom track of the 
>>> upstream regions of RefSeq Genes.  If you select the RefSeq Genes 
>>> track in the Table Browser and choose "output format: custom track", 
>>> you will be presented with an option to create one BED record per 
>>> region that is "Upstream by ___ bases".  Enter 1,000 or 2,000 in this 
>>> box and hit "get custom track in genome browser".  You should see a 
>>> new custom track containing blocks representing regions upstream of 
>>> all RefSeq Genes.
>>>
>>> Now you can intersect your new custom track with the multiz alignment 
>>> in the Conservation track to get only the upstream regions.  To do 
>>> this step, select the 17-way (or 28-way) Conservation track in the 
>>> Table Browser.  Select the table 'multiz17way' and region: genome.  
>>> Hit the "intersection: create" button and select your custom track.  
>>> Choose the option for "Base-pair-wise intersection (AND) of 17-Way 
>>> Cons and upstream regions from refGene" and hit submit.  Back on the 
>>> main Table Browser page, select "output format: MAF".  The size of the 
>>> file you will be creating is quite large (76 Mb compressed for 1,000 
>>> base regions).  I suggest entering a name for the file and selecting 
>>> the option to get a gzip compressed version of it.  Hit "get output".  
>>> You should end up with a MAF file that contains only the regions 
>>> upstream of RefSeq Genes.
>>>
>>> You may also be interested in the tools for working with MAF 
>>> alignments at Galaxy: http://galaxy.psu.edu/ .  Galaxy is run by our 
>>> collaborators at Penn State and extends the functionality of the Table 
>>> Browser.  For instance, there is a tool to filter any undesired 
>>> species from a MAF file, leaving only the species of interest to you.
>>>
>>> I hope this is helpful.  If you have further questions, please feel 
>>> free to contact us again at [email protected].  If you have 
>>> questions specific to Galaxy, their helpdesk email address is 
>>> [email protected].
>>>
>>> -- 
>>> Brooke Rhead
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>
>>> Subject: question or bug about UCSC genome browser sequence
>>> From: Anyuan Guo <[email protected]>
>>> Date: Mon, 17 Nov 2008 10:54:21 -0800
>>> To: [email protected]
>>>
>>>
>>> Dear author,
>>>     Thanks for you providing the wonderful database and website of UCSC
>>> genome browser.
>>>     I have question about the sequence in it.
>>>        I downloaded the human upstream 1000bp multiz alignment file from
>>> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz
>>>  
>>>
>>>     When I check my sequence id  NM_014223.
>>>     I can find the upstream 1000 bp sequence of this refseq gene in the
>>> downloaded multiz alignment file.
>>>     I also can search this id in genome browser and get the upstream
>>> 1000 bp using the "DNA" or "Tables" menu at the top of genome browser 
>>> page.
>>>     But I find these two upstream 1000 bp sequence are totally
>>> different. I think the one using genome browser is right.
>>>     But I am not just need the upstream 1000bp sequence, I need the
>>> alignment with mouse sequence.
>>>
>>>     Can I just get the sequence alignment between human and mouse for
>>> all the refseq gene and the upstream 1000 or 2000 of these genes? Where
>>> can I find it?
>>>     I think those ortholog gene alignment (including upstream
>>> regulatory sequence alignment) between two popular genome will be very
>>> useful.
>>>
>>> thanks.
>>>
>>> Anyuan
>>>
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to