Re: [Genome] Create multiple alignment of 12 Drosophila species genomes

Jim Kent Tue, 23 Mar 2010 08:31:23 -0700

It removes lines that are not in the species you don't need.  It will  
remove any columns that are all dashes after the lines are removed,  
corresponding to insertions that are unique to the species removed.   
It does not reallign.  In general if a multiple alignment program is  
working, you actually get better alignments including more species,  
even if at a later stage you remove them in this way.


On Mar 22, 2010, at 9:59 PM, 董珊 wrote:

> Hello,
>
> Thank you very much for your detailed explanation, it really helps  
> me a lot.
>
> But I still have a question about the mafSpeciesSubset program you  
> have adviced,
> Does it simply remove the blocks or lines that contain the species I  
> don't need  in maf file?
> Or, remove the unwanted species and then doing the multiple  
> alignment again?
>
> Because of my limited knowledge, I have a bit concern about the  
> amount of information of maf result
> if this program only remove blocks or lines contain the unwanted  
> species without doing multiple alignment again.
> Because I wanted to use this multiple alignment of 12 Drosophila  
> genome to extract genetic variations (i.e. substitutions and InDels).
>
> It would be grateful if you could tell me more about the method this  
> program use to remove species.
>
> Thank you very much in advance!
>
> Best regards!
> Shan
>
>
> 2010/3/23 Jim Kent <[email protected]>
> I think the mafSpeciesSubset program might actually be more what is  
> wanted.  Here's the command line:
>
> mafSpeciesSubset - Extract a maf that just has a subset of species.
> usage:
>   mafSpeciesSubset in.maf species.lst out.maf
> Where:
>    in.maf is a file where the sequence source are either simple  
> species
>           names, or species.something.  Usually actually it's a genome
>           database name rather than a species before the dot to tell  
> the
>           truth.
>    species.lst is a file with a list of species to keep
>    out.maf is the output.  It will have columns that are all - or . in
>           the reduced species set removed, as well as the lines  
> representing
>           species not in species.lst removed.
> options:
>   -keepFirst - If set, keep the first 'a' line in a maf no matter what
>                Useful for mafFrag results where we use this for the  
> gene name
>
>
>
> On Mar 22, 2010, at 12:10 PM, Jennifer Jackson wrote:
>
> Hello,
>
> The utility mafFilter is the best choice: use option -speciesFilter.  
> You
> do not need to set up the entire code tree or a mirror to use the
> utilities in the kent source tree.
>
> http://genomewiki.ucsc.edu/index.php/The_source_tree
> http://hgdownload.cse.ucsc.edu/downloads.html -> Source
>
> mafFilter - Filter out maf files. Output goes to standard out
> usage:
>   mafFilter file(s).maf
> options:
>   -tolerate - Just ignore bad input rather than aborting.
>   -minCol=N - Filter out blocks with fewer than N columns (default 1)
>   -minRow=N - Filter out blocks with fewer than N rows (default 2)
>   -factor - Filter out scores below -minFactor * (ncol**2) * nrow
>   -minFactor=N - Factor to use with -minFactor (default 5)
>   -minScore=N - Minimum allowed score (alternative to -minFactor)
>   -reject=filename - Save rejected blocks in filename
>   -needComp=species - all alignments must have species as one of the
> component
>   -overlap - Reject overlapping blocks in reference (assumes ordered
> blocks)
>   -componentFilter=filename - Filter out blocks without a component
> listed in filename
>   *-speciesFilter=filename* - Filter out blocks without a species
> listed in filename
>
> Hopefully this helps,
> Jennifer
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
> On 3/22/10 1:42 AM, 董珊 wrote:
> Dear Genome list,
>
> I want to ask a question about creat a multiple alignment from pair- 
> wise
> alignments from UCSC Genome Blowser.
> The whole genome multiple alignment I need is from 12 Drosophila  
> species,
> but the multiple alignment data on UCSC Genome Blowser contain 3  
> excess
> species (A. gambiae, A. mellifera and T. castaneum).
>
> I know there is a complex method to use multiz/TBA to get multiple  
> alignment
> from pair-wise alignments.
> However, these pair-wise alignments must in MAF format.
> But it seems difficult to get the format convert tool on Kent source.
> Do you know other method to remove the 3 excess species from the  
> multiple
> alignment from UCSC Genome Blowser?
>
> Thank you very much in advance!
>
> Best regards!
> Shan
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Create multiple alignment of 12 Drosophila species genomes

Reply via email to