Hi Brian, Thanks for your suggestions. I will have a look and see whether I can get the answer.
Best regards! Zhenguo 2011/2/7 Brian Raney <[email protected]> > Hey Zhenguo, > > Those are some good questions! I think maybe the best course for you > would be to look through our makeDoc files which are in the kent > source distribution in the directory kent//src/hg/makeDb/doc. The one > for hg19 is named hg19.txt. These makeDoc files contain > documentation of all the programs that were run to build the assembly > annotation. Many of these procedures are coded as Perl scripts that > appear in the kent/src/hg/utils/automation directory. > > Within these resources you should find the answers to all your > questions, at least such that you can recreate the steps we take to > build the annotation. > > I hope this resolves your issue. If you have more questions please > respond to this list. > > brian > > On Fri, Feb 4, 2011 at 2:56 PM, Zhenguo Zhang <[email protected]> > wrote: > > Dear Brian, > > > > Thank you for this information. I will try mafsInRegion (which seems very > > slow). I have a few questions related to the net alignment. > > > > 1. During the net construction with tool chainNet, some (long) chains are > > trimmed (matching boundaries of parent gaps) when filling gaps in parent > > chain, so only for these chains only the trimmed portion is recorded in > net > > file. Am I right? > > > > 2. If above is true, and I know that the axtNet files in > > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/axtNet are > generated > > by filtering the all chains (file hg19.rheMac2.all.chain.gz) using > netToAxt. > > Then do the alignments in axtNet files contain only trimmed portion for > > those trimmed chains or the whole chain? > > > > 3. netChainSubset is used to generate liftOver files in directory > > ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver. This process > also > > uses the net file to filter the chains, so what happens for those trimmed > > chains? Is the option -wholeChains used in your procedure? > > > > 4. Based on my understanding, the axtNet files and liftOver files are two > > different formats (axt and chain, respectively) of the same chains, both > > filtered by net files. Am I correct? If not, what is the difference? > > > > 5. Multiz46way alignment uses pairwise alignments produced above. Do they > > just include the trimmed alignments from trimmed chains? In the > conservation > > track description page, it says that an additional filtering step was > > introduced to reduce the number of paralogs and pseudogenes. I think > that > > in net files for a given target region (say hg18) there is only one best > > aligned region (if available) from query species (say rheMac2). What does > > 'reduce the number of paralogs' mean? Does it operate on query species? > > > > Sorry I have so many questions, but it is important for me to manipulate > the > > data correctly and efficiently. Your help is greatly appreciated and I am > > happy to cite UCSC. Thank you! > > > > Zhenguo > > > > > > 2011/2/4 Brian Raney <[email protected]> > > > >> Hey Zhenguo, > >> > >> I think for your purposes, the mafsInRegion program will be better to > >> use than mafFrags. mafsInRegion does not coalesce blocks, so it will > >> maintain the query addresses, and it won't put in the '.'s. > >> > >> Our standard alignment procedure calls the chainNet program (used to > >> create the net's from the chains) with a parameter that sets the > >> minimum gap to fill to be one base. Whether N's are considered to be > >> a gap or not depends on how many of them there are, and whether the > >> chaining process included them in the chain. In general, if there are > >> only a few N's, the chains will include them, so they won't be > >> considered a gap in the net. > >> > >> I hope this answers your questions. If you have follow-up questions, > >> please address them to this list. > >> > >> Brian > >> > >> On Thu, Feb 3, 2011 at 9:51 AM, Zhenguo Zhang <[email protected]> > >> wrote: > >> > Dear Colleagues, > >> > > >> > I am trying to get the orthologous/homologous regions from hg18 > >> multiz28way > >> > maf files for a list of human coordinates. I have read the documents > and > >> > track descriptions, but still have several questions: > >> > > >> > 1. The results from mafFrags lost the genomic coordinate information > for > >> all > >> > the query species, and one maf block for each coordinate region in > input > >> bed > >> > file. It may contain genomic breaks in some species, that is, the > >> > corresponding sequence for a query species (say, rheMac2) in the maf > >> block > >> > is composed of different genomic locations (from the same or different > >> > chromosomes). I derive this based on the net alignment construction > >> > procedure. I need to know this information and exclude these cases > >> because > >> > they are artificial and not true sequences. How can I get the genomic > >> breaks > >> > in the results from mafFrags? When I checked the alignment in the UCSC > >> > browser, they are given in different blocks, which seems based on > genomic > >> > breaks in any species. > >> > > >> > 2. In the mafFrags results, there are two symbols indicating gaps. One > is > >> > dash '-', and the other is dot '.'. I think dash is the real gaps in > >> query > >> > sequences, but what does the dot represent? Does it represent > unsequenced > >> > regions or gaps too? > >> > > >> > 3. In the net alignment construction, the gaps in the top level chain > are > >> > filled by trimmed lower-scoring chains. What is the minimum length for > >> gaps > >> > to be filled? Is 1-base long gap in top level chain also filled by > >> > lower-scoring chain? > >> > > >> > 4. During net alignment process, are the unsequenced regions (N's) > >> regarded > >> > as gaps and filled during this process? > >> > > >> > Thank you in advance! > >> > > >> > Zhenguo > >> > -- > >> > —————————————————————— > >> > Zhenguo Zhang > >> > Postdoctoral Scholar > >> > Institute of Molecular Evolutionary Genetics > >> > Penn State University > >> > 312 Mueller Lab, University Park, PA 16802 > >> > Tel: 814-865-2796 > >> > Homepage: http://www.personal.psu.edu/zuz17/ > >> > Lab: http://homes.bio.psu.edu/people/Faculty/Nei/ > >> > _______________________________________________ > >> > Genome maillist - [email protected] > >> > https://lists.soe.ucsc.edu/mailman/listinfo/genome > >> > > >> > > > > > > > > -- > > —————————————————————— > > Zhenguo Zhang > > Postdoctoral Scholar > > Institute of Molecular Evolutionary Genetics > > Penn State University > > 312 Mueller Lab, University Park, PA 16802 > > Tel: 814-865-2796 > > Homepage: http://www.personal.psu.edu/zuz17/ > > Lab: http://homes.bio.psu.edu/people/Faculty/Nei/ > > _______________________________________________ > > Genome maillist - [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > -- —————————————————————— Zhenguo Zhang Postdoctoral Scholar Institute of Molecular Evolutionary Genetics Penn State University 312 Mueller Lab, University Park, PA 16802 Tel: 814-865-2796 Homepage: http://www.personal.psu.edu/zuz17/ Lab: http://homes.bio.psu.edu/people/Faculty/Nei/ _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
