Hello,

Thank you very much for your detailed explanation, it really helps me a lot.

But I still have a question about the mafSpeciesSubset program you have
adviced,
Does it simply remove the blocks or lines that contain the species I don't
need  in maf file?
Or, remove the unwanted species and then doing the multiple alignment again?

Because of my limited knowledge, I have a bit concern about the amount of
information of maf result
if this program only remove blocks or lines contain the unwanted species
without doing multiple alignment again.
Because I wanted to use this multiple alignment of 12 Drosophila genome
to extract genetic variations (i.e. substitutions and InDels).

It would be grateful if you could tell me more about the method this program
use to remove species.

Thank you very much in advance!

Best regards!
Shan


2010/3/23 Jim Kent <[email protected]>

> I think the mafSpeciesSubset program might actually be more what is wanted.
>  Here's the command line:
>
> mafSpeciesSubset - Extract a maf that just has a subset of species.
> usage:
>   mafSpeciesSubset in.maf species.lst out.maf
> Where:
>    in.maf is a file where the sequence source are either simple species
>           names, or species.something.  Usually actually it's a genome
>           database name rather than a species before the dot to tell the
>           truth.
>    species.lst is a file with a list of species to keep
>    out.maf is the output.  It will have columns that are all - or . in
>           the reduced species set removed, as well as the lines
> representing
>           species not in species.lst removed.
> options:
>   -keepFirst - If set, keep the first 'a' line in a maf no matter what
>                Useful for mafFrag results where we use this for the gene
> name
>
>
>
> On Mar 22, 2010, at 12:10 PM, Jennifer Jackson wrote:
>
> Hello,
>>
>> The utility mafFilter is the best choice: use option -speciesFilter. You
>> do not need to set up the entire code tree or a mirror to use the
>> utilities in the kent source tree.
>>
>> http://genomewiki.ucsc.edu/index.php/The_source_tree
>> http://hgdownload.cse.ucsc.edu/downloads.html -> Source
>>
>> mafFilter - Filter out maf files. Output goes to standard out
>> usage:
>>   mafFilter file(s).maf
>> options:
>>   -tolerate - Just ignore bad input rather than aborting.
>>   -minCol=N - Filter out blocks with fewer than N columns (default 1)
>>   -minRow=N - Filter out blocks with fewer than N rows (default 2)
>>   -factor - Filter out scores below -minFactor * (ncol**2) * nrow
>>   -minFactor=N - Factor to use with -minFactor (default 5)
>>   -minScore=N - Minimum allowed score (alternative to -minFactor)
>>   -reject=filename - Save rejected blocks in filename
>>   -needComp=species - all alignments must have species as one of the
>> component
>>   -overlap - Reject overlapping blocks in reference (assumes ordered
>> blocks)
>>   -componentFilter=filename - Filter out blocks without a component
>> listed in filename
>>   *-speciesFilter=filename* - Filter out blocks without a species
>> listed in filename
>>
>> Hopefully this helps,
>> Jennifer
>>
>> ---------------------------------
>> Jennifer Jackson
>> UCSC Genome Informatics Group
>> http://genome.ucsc.edu/
>>
>> On 3/22/10 1:42 AM, 董珊 wrote:
>>
>>> Dear Genome list,
>>>
>>> I want to ask a question about creat a multiple alignment from pair-wise
>>> alignments from UCSC Genome Blowser.
>>> The whole genome multiple alignment I need is from 12 Drosophila species,
>>> but the multiple alignment data on UCSC Genome Blowser contain 3 excess
>>> species (A. gambiae, A. mellifera and T. castaneum).
>>>
>>> I know there is a complex method to use multiz/TBA to get multiple
>>> alignment
>>> from pair-wise alignments.
>>> However, these pair-wise alignments must in MAF format.
>>> But it seems difficult to get the format convert tool on Kent source.
>>> Do you know other method to remove the 3 excess species from the multiple
>>> alignment from UCSC Genome Blowser?
>>>
>>> Thank you very much in advance!
>>>
>>> Best regards!
>>> Shan
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to