Hello John, Sorry to hear about the problems transferring data over to Galaxy. It is a long process. It would be much better to download the files locally and use some of the utilities from the kent source tree.
Some popular MAF options are below. You could also look at the maf* programs from the utility set and see which will perform the exact function(s) that you need: http://genomewiki.cse.ucsc.edu/index.php/Kent_source_utilities Hopefully this helps! Jennifer ------- $ mafFrags mafFrags - Collect MAFs from regions specified in a 6 column bed file usage: mafFrags database track in.bed out.maf options: -orgs=org.txt - File with list of databases/organisms in order -bed12 - If set, in.bed is a bed 12 file, including exons -thickOnly - Only extract subset between thickStart/thickEnd -meFirst - Put native sequence first in maf -txStarts - Add MAF txstart region definitions ('r' lines) using BED name and output actual reference genome coordinates in MAF. -refCoords - output actual reference genome coordinates in MAF. $ mafsInRegion mafsInRegion - Extract MAFS in a genomic region usage: mafsInRegion regions.bed out.maf|outDir in.maf(s) options: -outDir - output separate files named by bed name field to outDir -keepInitialGaps - keep alignment columns at the beginning and of a block that are gapped in all species $ mafSplit mafSplit - Split multiple alignment files usage: mafSplit splits.bed outRoot file(s).maf options: -byTarget Make one file per target sequence. (splits.bed input is ignored). -outDirDepth=N For use only with -byTarget. Create N levels of output directory under current dir. This helps prevent NFS problems with a large number of file in a directory. Using -outDirDepth=3 would produce ./1/2/3/outRoot123.maf. -useSequenceName For use only with -byTarget. Instead of auto-incrementing an integer to determine output filename, expect each target sequence name to end with a unique number and use that number as the integer to tack onto outRoot. -useHashedName=N For use only with -byTarget. Instead of auto-incrementing an integer or requiring a unique number in the sequence name, use a hash function on the sequence name to compute an N-bit number. This limits the max #filenames to 2^N and ensures that even if different subsets of sequences appear in different pairwise mafs, the split file names will be consistent (due to hash function). This option is useful when a "scaffold-based" assembly has more than one sequence name pattern, e.g. both chroms and scaffolds. --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/2/10 2:52 AM, John Reid wrote: > Hi, > > I'm trying to retrieve regions that are aligned to oRegAnno annotations > in several species. Initially I've started with mouse. I can use the > UCSC DAS capability to retrieve the oRegAnno features and retrieve the > DNA sequences for them. I can't work out how to get to bases in other > species using DAS or any other method that I can script. > > I found this advice from Jennifer Jackson in an old post on this newsgroup: >> Using the tools at UCSC, the Table Browser will return blocks of >> Conservation MAF results, but not specific bases. However, by sending the >> data over to Galaxy, "slices" of the Conservation track's MAF alignment can >> be retrieved in batch using a custom track of intervals (down to a single >> base). >> >> To do this: >> >> 1) Create and load a custom track in BED format of the genome positions of >> interest >> 2) Send the custom track to Galaxy by extracting it from the Table browser >> and checking Galaxy as the output choice >> 3) Send of the Conservation track's MAF alignment data to Galaxy using same >> method (you may need to subset this by chromosome to improve >> speed/performance) >> 3) Use the Galaxy tools: Fetch Alignments -> Extract MAF blocks given a set >> of genomic intervals >> >> UCSC help is as follows: >> >> http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks >> http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#TableBrowser >> http://genome.ucsc.edu/FAQ/FAQformat#format1 >> http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms >> >> Galaxy help is available at their web site if you have questions about the >> tools. >> > > I tried this but it took hours just to load the conservation track for > mouse chromosome 10 into galaxy. I have several organisms, each with > many chromosomes. Is there a better way to do it? It's frustrating > because with a few clicks on the Genome Browser web interface I can > retrieve the information I need for one particular region but I can't > work out how to write a script to retrieve it. > > Thanks in advance, > John. > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
