Dear Colleagues,

I am trying to get the orthologous/homologous regions from hg18 multiz28way
maf files for a list of human coordinates. I have read the documents and
track descriptions, but still have several questions:

1. The results from mafFrags lost the genomic coordinate information for all
the query species, and one maf block for each coordinate region in input bed
file. It may contain genomic breaks in some species, that is, the
corresponding sequence for a query species (say, rheMac2) in the maf block
is composed of different genomic locations (from the same or different
chromosomes). I derive this based on  the net alignment construction
procedure. I need to know this information and exclude these cases because
they are artificial and not true sequences. How can I get the genomic breaks
in the results from mafFrags? When I checked the alignment in the UCSC
browser, they are given in different blocks, which seems based on genomic
breaks in any species.

2. In the mafFrags results, there are two symbols indicating gaps. One is
dash '-', and the other is dot '.'. I think dash is the real gaps in query
sequences, but what does the dot represent? Does it represent unsequenced
regions or gaps too?

3. In the net alignment construction, the gaps in the top level chain are
filled by trimmed lower-scoring chains. What is the minimum length for gaps
to be filled? Is 1-base long gap in top level chain also filled by
lower-scoring chain?

4. During net alignment process, are the unsequenced regions (N's) regarded
as gaps and filled during this process?

Thank you in advance!

Zhenguo
-- 
——————————————————————
Zhenguo Zhang
Postdoctoral Scholar
Institute of Molecular Evolutionary Genetics
Penn State University
312 Mueller Lab, University Park, PA 16802
Tel: 814-865-2796
Homepage: http://www.personal.psu.edu/zuz17/
Lab:  http://homes.bio.psu.edu/people/Faculty/Nei/
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to