I'm not sure if I sent the questions below successfully to the maillist;
I did not get response. It might be too long. Thanks for help -Xianjun
=========================================
Hi,
I am back again for more detail technique help about making self-chain
for zebrafish.
I learned the wiki page you sent me and got to know that you used the
doBlastzChainNet.pl pipeline to make chain (selfChain), using a DEF file
for all parameters needed. I read the doBlastzChainNet.pl -help output,
and also got some sample code by grep "BLASTZ SELF" the
~/kent/src/hg/makeDb/doc/*. To run the pipeline, I have several
questions here:
1. I installed kent's source tree, and the blastz program. I also made
the .2bit file for zebrafish (Zv8), and the chromosome size file. What
else I need to get the selfChain?
2. I made a DEF file like the following. Could you check if there is
anything missed/wrong?
---------------------------------------
# zebrafish vs zebrafish
export PATH=/usr/bin:/bin:/usr/local/bin:/home/xianjund/bin/x86_64
BLASTZ=blastz
BLASTZ_M=400
# TARGET: Zebrafish danRer6
SEQ1_DIR=/export/data/goldenpath/danRer6
SEQ1_LEN=/export/data/goldenpath/zv8_EnsemblPre/zv8.chr.info.txt
SEQ1_CHUNK=10000000
SEQ1_LAP=10000
SEQ1_IN_CONTIGS=0
# QUERY: Zebrafish danRer6
SEQ2_DIR=/export/data/goldenpath/danRer6
SEQ2_LEN=/export/data/goldenpath/zv8_EnsemblPre/zv8.chr.info.txt
SEQ2_CHUNK=10000000
SEQ2_LAP=0
SEQ2_IN_CONTIGS=0
BASE=/export/data/goldenpath/danRer6/blastzSelf.2009-06-02
TMPDIR=/scratch/tmp
---------------------------------------
3. Luckily, I grep piece of code written by Hiram 3 years ago, in the
~/kent/src/hg/makeDb/doc/hg18.txt, which would help me a lot (I copied
here, see below). But I have two questions to this:
1). Do I have to run the pipeline in the cluster? Can I just run it
on a server, in case I don't have assess to a cluster? If so, how should
I set parameters for the pipeline?
2). After running the doBlastzChainNet.pl, it seems you ssh to
another machine ("ssh kolossus") and run the featureBits. What's the
purpose for this? Do I have to include this part if I just want to make
the selfChain data?
3). If I want to make a chainSelf table in MySQL (like the table in
ucsc), what additional script I should run for that?
4. Last question, I noticed that in your description page of self Chain,
it says to use a specific matrix for the dynamic program which was run
over the kd-trees to find the maximally scoring chains of these blocks.
But the matrix is not given in the tetraodon selfChain page. Can I know
how I should set the matrix for making zebrafish selfChain? Or, this
does not matter at all, for the integrated doBlastzChainNet.pl
pipeline? The reason to ask this question is, since the selfChain is
mainly for detecting the paralog part in the genome, and most of those
are from duplication (whole-genome duplication like in teleost file, or
tandem duplication locally), the matrix for scoring the chain should
somehow measure the distance from the split point (e.g. when WGD
happened) to now in different genome. I guess this should be different
for zebrafish (where the WGD happened 300-450 Mya) and human (where the
2R WGD happened much earlier). But how UCSC set the substitution(?)
matrix, I have no clue. Like to hear option from you.
Sorry for too many questions :) Thanks for any help
-Xianjun
============ sample code from ~/kent/src/hg/makeDb/doc/hg18.txt
=================
# BLASTZ SELF (DONE - 2006-01-17 - 2006-01-20 - Hiram)
ssh pk
mkdir /cluster/data/hg18/bed/blastzSelf.2006-01-17
cd /cluster/data/hg18/bed/blastzSelf.2006-01-17
# prepare the DEF file
cd /cluster/data/hg18/bed/blastzSelf.2006-01-17
time /cluster/bin/scripts/doBlastzChainNet.pl -verbose=2 \
-chainMinScore=10000 -chainLinearGap=medium -bigClusterHub=pk \
`pwd`/DEF > blastz.out 2>&1 &
# real 640m37.637s
ssh kolossus
cd /cluster/data/hg18/bed/blastzSelf.2006-01-17
time HGDB_CONF=~/.hg.conf.read-only featureBits \
-noRandom -noHap hg18 chainSelfLink > fb.chainSelfLink 2>&1 &
# real 21m52.697s
# 324067552 bases of 2858034764 (11.339%) in intersection
Kayla Smith wrote:
> Hello Xianjun,
>
> Here is a wiki page on chains and nets:
> http://genomewiki.ucsc.edu/index.php/Chains_Nets
>
> You would need to download the kent source tree:
> http://genome.ucsc.edu/FAQ/FAQlicense.html#license3
>
> I hope this information is helpful to you.
>
> Kayla Smith
> UCSC Genome Bioinformatics Group
>
> ----- "Xianjun" <[email protected]> wrote:
>
>
>> Thanks for the good news
>>
>> Personally, I might have to use zebrafish self-chain data before July.
>>
>> Could you kindly guide me how to do that myself? I mean the zv8, if
>> possible.
>>
>> Regards,
>>
>> Xianjun
>>
>> Donna Karolchik wrote:
>>
>>> hi Xianjun,
>>>
>>> You'll be pleased to hear that we do indeed have the zv8 assembly
>>>
>> back
>>
>>> on our active project list, and hope to have it available on the
>>> public site by perhaps sometime in July, depending on other incoming
>>>
>>> priorities and staffing levels. We will announce the release on our
>>>
>>> [email protected] mailing list when the browser becomes
>>> available.
>>>
>>> -Donna
>>> ---------------
>>> Donna Karolchik
>>> UCSC Genome Browser Project Manager
>>> http://genome.ucsc.edu
>>>
>>>
>>> Xianjun Dong wrote:
>>>
>>>> Hi,
>>>>
>>>> OK. I am back with the same request now, after the
>>>> assembly/annotation of Zv8 is done.
>>>>
>>>> We (and also the community, I think) know that, Zv8 was expected as
>>>>
>> a
>>
>>>> big improvement for Zv7, and it IS indeed, from the analysis report
>>>>
>>>> they released. So, we eagerly request UCSC, as one of the main hubs
>>>>
>>>> of data/tool for bioinformatics, to
>>>> 1. update danRer6 (Zv8) new assembly on UCSC
>>>> 2. make hg18:danRer6 chain/net alignment
>>>> 3. put zebrafish self-chain alignment.
>>>>
>>>> Thanks
>>>>
>>>> Regards,
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> Jennifer Jackson wrote:
>>>>
>>>>> Hello,
>>>>> One of our scientists has some specific ideas concerning the
>>>>> zebrafish assembly as follows:
>>>>>
>>>>> They think the main reason the genome is so difficult to assemble
>>>>>
>>>>> was due
>>>>> to the DNA collection strategy:-
>>>>>
>>>>>
>> http://www.sanger.ac.uk/Projects/D_rerio/Zv3_assembly_information.shtml
>>
>>>>> The FAQ indicates there should be a finished genome by the end of
>>>>>
>>>>> this year:
>>>>> http://www.sanger.ac.uk/Projects/D_rerio/faqs.shtml#factsnine
>>>>>
>>>>> Maybe you could discuss your suggestion with the sequencing
>>>>>
>> project,
>>
>>>>> and if it would help them, we could discuss it further.
>>>>>
>>>>> Thank you for your offer to help improve the data,
>>>>> Jennifer Jackson
>>>>> UCSC Genome Bioinformatics Group
>>>>>
>>>>> Xianjun Dong wrote:
>>>>>
>>>>>> To those who might concern,
>>>>>>
>>>>>> Zebrafish has been one of the most studied models in study of
>>>>>>
>> whole
>>
>>>>>> genome duplication and development, but its genome assembly is
>>>>>>
>> not
>>
>>>>>> so well (which is naturally difficult also due to the whole
>>>>>>
>> genome
>>
>>>>>> duplication there). We also noticed much duplication closely
>>>>>>
>> mapped
>>
>>>>>> in same chromosome, which actually are proved as assembly error
>>>>>>
>> in
>>
>>>>>> zv7, by BLATing in the new assembly Zv8
>>>>>> (http://pre.ensembl.org/Danio_rerio/Info/Index). Before Zv8
>>>>>> annotation get done (which might help to some extent, but not
>>>>>>
>> all),
>>
>>>>>> I am thinking if UCSC could make a self-chain for zebrafish, just
>>>>>>
>>>>>> like you did for human. If that information offered, we could
>>>>>>
>> write
>>
>>>>>> a script to quickly check those 'tandem' duplications close in
>>>>>> genome, which can eventually help to improve the quality of the
>>>>>> current assembly.
>>>>>>
>>>>>> If you think this might not be done in the coming soon by your
>>>>>> plan, I will be appreciated if you can offer any assistance for
>>>>>>
>> me
>>
>>>>>> to try it myself.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>
>>>>>>
>> _______________________________________________
>> Genome maillist - [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
--
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: [email protected]
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome