Hi again, Anyuan, One of our developers created updated versions of the hg18 17-way upstream MAF files for you. They are located here:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/ The time stamps on the files are: upstream1000.maf.gz 12-Dec-2008 13:51 52M upstream2000.maf.gz 12-Dec-2008 13:51 124M upstream5000.maf.gz 12-Dec-2008 13:52 272M I hope this helps! -- Brooke Rhead UCSC Genome Bioinformatics Group On 12/10/08 13:12, Brooke Rhead wrote: > Hi Anyuan, > > I don't believe it is possible to retain the RefSeq ID in this case when > using the Table Browser. However, I think that Galaxy has this > capacity, either by doing the intersection from scratch using their > tools, or by joining your MAF with your custom track based on the genome > coordinates. > > Galaxy has screencasts: > http://galaxy.psu.edu/screencasts.html > > and a wiki: > http://g2.trac.bx.psu.edu/ > > This screencast might be particularly helpful: > http://screencast.g2.bx.psu.edu/galaxy/MAF_manipulation/ > > If you have more questions about how to accomplish your task using > Galaxy, you can contact them at [email protected]. > > Good luck with your research. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > a lot of at [email protected] for help. > > On 12/10/08 15:17, Anyuan Guo wrote: >> Dear Brooke, >> Thanks very much. I learned a lot about creating custom track in your >> email. I can download a ~76Mb compressed file when I follow your >> instruction to create a custom track for upstream 1000 bp of RefseqGene >> and intersect with 17-way Cons. But I found the file format is not begin >> with Refseq ID (NM_xxxx). The following is the first 4 lines of the file. >> ##maf version=1 >> a score=-55252.000000 >> s hg18.chr1 14754 99 + 247249719 >> CTGTGGGTCGGAGCCGGAGCGTCAGAGC---------CACCCACGACCACCGGCACGCC----CCCACCACA-GGGCAGCGTGG-TGTTGAGACAAC------A >> >> >> >> In fact, I need a file begin with Refseq ID, the downloaded maf >> file >> h(ttp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz) >> >> exactly match my requirement. But because some refseq sequences were >> updated, the downloaded file is out of date. >> The following is the first 4 lines of the downloaded file, which I need. >> ##maf version=1 scoring=zero >> a score=0.000000 >> s NM_198943 0 1000 + 1000 >> GCATTTTAAACCCAAGTG----AAATCTCCTAGG----------CCCTTCATGCCACACTCA-----TCCATCCCTACCTAC--TTGTGTTGCAACCAAGGGCCCCAC >> >> >> >> How can I get the up-to-date version of this download file? >> Thanks. >> >> Anyuan >> >> >> Brooke Rhead wrote: >>> Hello Anyuan, >>> >>> The reason that the sequence is different via the download file and >>> the Table Browser is that the sequence associated with NM_014223 at >>> RefSeq has changed since the download file was made. The items in the >>> RefSeq Genes track are updated daily; the download files are generally >>> only made once. >>> >>> You can see the revision history for any GenBank accession at NCBI: >>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=NM_014223 >>> >>> The download file was last updated on 7-7-2007. I tried blatting the >>> NM_014223 sequence from the "Jun 3 2007 1:10 PM" update to the hg18 >>> assembly, and the sequence aligned starting at the genomic coordinate >>> chr1:40,929,952. The upstream sequence from the file you downloaded >>> corresponds to the 1,000 bases upstream of that base. >>> >>> You can get an up-to-date version of the download file by creating >>> yourself with the Table Browser. First, make a custom track of the >>> upstream regions of RefSeq Genes. If you select the RefSeq Genes >>> track in the Table Browser and choose "output format: custom track", >>> you will be presented with an option to create one BED record per >>> region that is "Upstream by ___ bases". Enter 1,000 or 2,000 in this >>> box and hit "get custom track in genome browser". You should see a >>> new custom track containing blocks representing regions upstream of >>> all RefSeq Genes. >>> >>> Now you can intersect your new custom track with the multiz alignment >>> in the Conservation track to get only the upstream regions. To do >>> this step, select the 17-way (or 28-way) Conservation track in the >>> Table Browser. Select the table 'multiz17way' and region: genome. >>> Hit the "intersection: create" button and select your custom track. >>> Choose the option for "Base-pair-wise intersection (AND) of 17-Way >>> Cons and upstream regions from refGene" and hit submit. Back on the >>> main Table Browser page, select "output format: MAF". The size of the >>> file you will be creating is quite large (76 Mb compressed for 1,000 >>> base regions). I suggest entering a name for the file and selecting >>> the option to get a gzip compressed version of it. Hit "get output". >>> You should end up with a MAF file that contains only the regions >>> upstream of RefSeq Genes. >>> >>> You may also be interested in the tools for working with MAF >>> alignments at Galaxy: http://galaxy.psu.edu/ . Galaxy is run by our >>> collaborators at Penn State and extends the functionality of the Table >>> Browser. For instance, there is a tool to filter any undesired >>> species from a MAF file, leaving only the species of interest to you. >>> >>> I hope this is helpful. If you have further questions, please feel >>> free to contact us again at [email protected]. If you have >>> questions specific to Galaxy, their helpdesk email address is >>> [email protected]. >>> >>> -- >>> Brooke Rhead >>> UCSC Genome Bioinformatics Group >>> >>> >>> >>> Subject: question or bug about UCSC genome browser sequence >>> From: Anyuan Guo <[email protected]> >>> Date: Mon, 17 Nov 2008 10:54:21 -0800 >>> To: [email protected] >>> >>> >>> Dear author, >>> Thanks for you providing the wonderful database and website of UCSC >>> genome browser. >>> I have question about the sequence in it. >>> I downloaded the human upstream 1000bp multiz alignment file from >>> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz17way/upstream1000.maf.gz >>> >>> >>> When I check my sequence id NM_014223. >>> I can find the upstream 1000 bp sequence of this refseq gene in the >>> downloaded multiz alignment file. >>> I also can search this id in genome browser and get the upstream >>> 1000 bp using the "DNA" or "Tables" menu at the top of genome browser >>> page. >>> But I find these two upstream 1000 bp sequence are totally >>> different. I think the one using genome browser is right. >>> But I am not just need the upstream 1000bp sequence, I need the >>> alignment with mouse sequence. >>> >>> Can I just get the sequence alignment between human and mouse for >>> all the refseq gene and the upstream 1000 or 2000 of these genes? Where >>> can I find it? >>> I think those ortholog gene alignment (including upstream >>> regulatory sequence alignment) between two popular genome will be very >>> useful. >>> >>> thanks. >>> >>> Anyuan >>> > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
