Hi Brian,

Thanks for your suggestions. I will have a look and see whether I can get
the answer.

Best regards!

Zhenguo

2011/2/7 Brian Raney <[email protected]>

> Hey Zhenguo,
>
> Those are some good questions!  I think maybe the best course for you
> would be to look through our makeDoc files which are in the kent
> source distribution in the directory kent//src/hg/makeDb/doc.  The one
> for hg19 is named hg19.txt.   These makeDoc files contain
> documentation of all the programs that were run to build the assembly
> annotation.  Many of these procedures are coded as Perl scripts that
> appear in the kent/src/hg/utils/automation directory.
>
> Within these resources you should find the answers to all your
> questions, at least such that you can recreate the steps we take to
> build the annotation.
>
> I hope this resolves your issue.  If you have more questions please
> respond to this list.
>
> brian
>
> On Fri, Feb 4, 2011 at 2:56 PM, Zhenguo Zhang <[email protected]>
> wrote:
> > Dear Brian,
> >
> > Thank you for this information. I will try mafsInRegion (which seems very
> > slow). I have a few questions related to the net alignment.
> >
> > 1. During the net construction with tool chainNet, some (long) chains are
> > trimmed (matching boundaries of parent gaps) when filling gaps in parent
> > chain, so only for these chains only the trimmed portion is recorded in
> net
> > file. Am I right?
> >
> > 2. If above is true, and I know that the axtNet files in
> > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/axtNet are
> generated
> > by filtering the all chains (file hg19.rheMac2.all.chain.gz) using
> netToAxt.
> > Then do the alignments in axtNet files contain only trimmed portion for
> > those trimmed chains or the whole chain?
> >
> > 3. netChainSubset is used to generate liftOver files in directory
> > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver. This process
> also
> > uses the net file to filter the chains, so what happens for those trimmed
> > chains? Is the option -wholeChains used in your procedure?
> >
> > 4. Based on my understanding, the axtNet files and liftOver files are two
> > different formats (axt and chain, respectively) of the same chains, both
> > filtered by net files. Am I correct? If not, what is the difference?
> >
> > 5. Multiz46way alignment uses pairwise alignments produced above. Do they
> > just include the trimmed alignments from trimmed chains? In the
> conservation
> > track description page, it says that an additional filtering step was
> > introduced to reduce the number of paralogs and pseudogenes.  I think
> that
> > in net files for a given target region (say hg18) there is only one best
> > aligned region (if available) from query species (say rheMac2). What does
> > 'reduce the number of paralogs' mean? Does it operate on query species?
> >
> > Sorry I have so many questions, but it is important for me to manipulate
> the
> > data correctly and efficiently. Your help is greatly appreciated and I am
> > happy to cite UCSC. Thank you!
> >
> > Zhenguo
> >
> >
> > 2011/2/4 Brian Raney <[email protected]>
> >
> >> Hey Zhenguo,
> >>
> >> I think for your purposes, the mafsInRegion program will be better to
> >> use than mafFrags.  mafsInRegion does not coalesce blocks, so it will
> >> maintain the query addresses, and it won't put in the '.'s.
> >>
> >> Our standard alignment procedure calls the chainNet program (used to
> >> create the net's from the chains) with a parameter that sets the
> >> minimum gap to fill to be one base.   Whether N's are considered to be
> >> a gap or not depends on how many of them there are, and whether the
> >> chaining process included them in the chain.  In general, if there are
> >> only a few N's, the chains will include them, so they won't be
> >> considered a gap in the net.
> >>
> >> I hope this answers your questions.   If you have follow-up questions,
> >> please address them to this list.
> >>
> >> Brian
> >>
> >> On Thu, Feb 3, 2011 at 9:51 AM, Zhenguo Zhang <[email protected]>
> >> wrote:
> >> > Dear Colleagues,
> >> >
> >> > I am trying to get the orthologous/homologous regions from hg18
> >> multiz28way
> >> > maf files for a list of human coordinates. I have read the documents
> and
> >> > track descriptions, but still have several questions:
> >> >
> >> > 1. The results from mafFrags lost the genomic coordinate information
> for
> >> all
> >> > the query species, and one maf block for each coordinate region in
> input
> >> bed
> >> > file. It may contain genomic breaks in some species, that is, the
> >> > corresponding sequence for a query species (say, rheMac2) in the maf
> >> block
> >> > is composed of different genomic locations (from the same or different
> >> > chromosomes). I derive this based on  the net alignment construction
> >> > procedure. I need to know this information and exclude these cases
> >> because
> >> > they are artificial and not true sequences. How can I get the genomic
> >> breaks
> >> > in the results from mafFrags? When I checked the alignment in the UCSC
> >> > browser, they are given in different blocks, which seems based on
> genomic
> >> > breaks in any species.
> >> >
> >> > 2. In the mafFrags results, there are two symbols indicating gaps. One
> is
> >> > dash '-', and the other is dot '.'. I think dash is the real gaps in
> >> query
> >> > sequences, but what does the dot represent? Does it represent
> unsequenced
> >> > regions or gaps too?
> >> >
> >> > 3. In the net alignment construction, the gaps in the top level chain
> are
> >> > filled by trimmed lower-scoring chains. What is the minimum length for
> >> gaps
> >> > to be filled? Is 1-base long gap in top level chain also filled by
> >> > lower-scoring chain?
> >> >
> >> > 4. During net alignment process, are the unsequenced regions (N's)
> >> regarded
> >> > as gaps and filled during this process?
> >> >
> >> > Thank you in advance!
> >> >
> >> > Zhenguo
> >> > --
> >> > ——————————————————————
> >> > Zhenguo Zhang
> >> > Postdoctoral Scholar
> >> > Institute of Molecular Evolutionary Genetics
> >> > Penn State University
> >> > 312 Mueller Lab, University Park, PA 16802
> >> > Tel: 814-865-2796
> >> > Homepage: http://www.personal.psu.edu/zuz17/
> >> > Lab:  http://homes.bio.psu.edu/people/Faculty/Nei/
> >> > _______________________________________________
> >> > Genome maillist  -  [email protected]
> >> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >> >
> >>
> >
> >
> >
> > --
> > ——————————————————————
> > Zhenguo Zhang
> > Postdoctoral Scholar
> > Institute of Molecular Evolutionary Genetics
> > Penn State University
> > 312 Mueller Lab, University Park, PA 16802
> > Tel: 814-865-2796
> > Homepage: http://www.personal.psu.edu/zuz17/
> > Lab:  http://homes.bio.psu.edu/people/Faculty/Nei/
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >
>



-- 
——————————————————————
Zhenguo Zhang
Postdoctoral Scholar
Institute of Molecular Evolutionary Genetics
Penn State University
312 Mueller Lab, University Park, PA 16802
Tel: 814-865-2796
Homepage: http://www.personal.psu.edu/zuz17/
Lab:  http://homes.bio.psu.edu/people/Faculty/Nei/
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to