Hey Zhenguo, I think for your purposes, the mafsInRegion program will be better to use than mafFrags. mafsInRegion does not coalesce blocks, so it will maintain the query addresses, and it won't put in the '.'s.
Our standard alignment procedure calls the chainNet program (used to create the net's from the chains) with a parameter that sets the minimum gap to fill to be one base. Whether N's are considered to be a gap or not depends on how many of them there are, and whether the chaining process included them in the chain. In general, if there are only a few N's, the chains will include them, so they won't be considered a gap in the net. I hope this answers your questions. If you have follow-up questions, please address them to this list. Brian On Thu, Feb 3, 2011 at 9:51 AM, Zhenguo Zhang <[email protected]> wrote: > Dear Colleagues, > > I am trying to get the orthologous/homologous regions from hg18 multiz28way > maf files for a list of human coordinates. I have read the documents and > track descriptions, but still have several questions: > > 1. The results from mafFrags lost the genomic coordinate information for all > the query species, and one maf block for each coordinate region in input bed > file. It may contain genomic breaks in some species, that is, the > corresponding sequence for a query species (say, rheMac2) in the maf block > is composed of different genomic locations (from the same or different > chromosomes). I derive this based on the net alignment construction > procedure. I need to know this information and exclude these cases because > they are artificial and not true sequences. How can I get the genomic breaks > in the results from mafFrags? When I checked the alignment in the UCSC > browser, they are given in different blocks, which seems based on genomic > breaks in any species. > > 2. In the mafFrags results, there are two symbols indicating gaps. One is > dash '-', and the other is dot '.'. I think dash is the real gaps in query > sequences, but what does the dot represent? Does it represent unsequenced > regions or gaps too? > > 3. In the net alignment construction, the gaps in the top level chain are > filled by trimmed lower-scoring chains. What is the minimum length for gaps > to be filled? Is 1-base long gap in top level chain also filled by > lower-scoring chain? > > 4. During net alignment process, are the unsequenced regions (N's) regarded > as gaps and filled during this process? > > Thank you in advance! > > Zhenguo > -- > —————————————————————— > Zhenguo Zhang > Postdoctoral Scholar > Institute of Molecular Evolutionary Genetics > Penn State University > 312 Mueller Lab, University Park, PA 16802 > Tel: 814-865-2796 > Homepage: http://www.personal.psu.edu/zuz17/ > Lab: http://homes.bio.psu.edu/people/Faculty/Nei/ > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
