Good Morning:
These genomes are not repeat masked because they are too small.
It isn't important to get rid of the repeats. If alignments
can be completed efficiently without removing the repeats,
then it is OK to skip the repeat masker step.
I see that the contents of chain.csh are missing from the sacCer2.txt file:
#!/bin/csh -ef
set S = $1
set C = $2
set IN = ../../psl/${C}.${S}.psl.gz
set OUT = ${S}/${C}.chain
mkdir -p ${S}
zcat ${IN} \
| axtChain -psl -verbose=0 -minScore=1000 -linearGap=medium stdin \
/hive/data/genomes/sacCer2/sacCer2.2bit \
-faQ ../../${S} stdout \
| chainAntiRepeat /hive/data/genomes/sacCer2/sacCer2.2bit \
../../${S}.2bit stdin ${OUT}
Each invocation of this script looks like:
chain.csh sacBay chrIV sacBay/chrIV.chain
Each reference to a source file: ../../sacBay
is a reference to the fasta sequence in that file.
These are ASCII fasta files, no need for nib files, the
genomes are tiny.
--Hiram
meritxell oliva wrote:
> Dear UCSCers,
>
> I'm trying to reproduce the Scer2 7-way alignment, as in
> http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/doc/sacCer2.txt
>
> . Before running blastz, genomes are usually faToNib-masked for repeats,
> like in the example based on ciona alignments:
> http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto
>
> In the yeast alignment, this step is missing. Am I right? If so, why? I
> mean, which should be the criteria to include/skip this step?
>
> . Even if *.nib files are not necessary to feed blastz, it seems that they
> are necessary to carry out the axtChain-chaining step. Is that right?
>
> . Although chain.csh is created for the chaining step, I don't see axtChain
> invoked anywhere in the pipeline. I can only observe that *.chain files are
> created somehow and then used to feed chainMergeSort.
> Could you please clarify how is the chaining step performed?
>
> Thanks in advance
>
> Meritxell Oliva
> PhD Student
> Comparative Bioinformatics Group
> Bioinformatics and Genomics Programme
>
> Centre de Regulacio Genomica (CRG)
> Dr. Aiguader, 88
> 08003 Barcelona
> Spain
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome