Hi,

I am back again for more detail technique help about making self-chain 
for zebrafish.

I learned the wiki page you sent me and got to know that you used the 
doBlastzChainNet.pl pipeline to make chain (selfChain), using a DEF file 
for all parameters needed. I read the doBlastzChainNet.pl -help output, 
and also got some sample code by grep "BLASTZ SELF" the 
~/kent/src/hg/makeDb/doc/*. To run the pipeline, I have several 
questions here:

1. I installed kent's source tree, and the blastz program. I also made 
the .2bit file for zebrafish (Zv8), and the chromosome size file. What 
else I need to get the selfChain?

2. I made a DEF file like the following. Could you check if there is 
anything missed/wrong?

---------------------------------------
# zebrafish vs zebrafish
export PATH=/usr/bin:/bin:/usr/local/bin:/home/xianjund/bin/x86_64

BLASTZ=blastz
BLASTZ_M=400

# TARGET: Zebrafish danRer6
SEQ1_DIR=/export/data/goldenpath/danRer6
SEQ1_LEN=/export/data/goldenpath/zv8_EnsemblPre/zv8.chr.info.txt
SEQ1_CHUNK=10000000
SEQ1_LAP=10000
SEQ1_IN_CONTIGS=0

# QUERY: Zebrafish danRer6
SEQ2_DIR=/export/data/goldenpath/danRer6
SEQ2_LEN=/export/data/goldenpath/zv8_EnsemblPre/zv8.chr.info.txt
SEQ2_CHUNK=10000000
SEQ2_LAP=0
SEQ2_IN_CONTIGS=0

BASE=/export/data/goldenpath/danRer6/blastzSelf.2009-06-02
TMPDIR=/scratch/tmp
---------------------------------------

3. Luckily, I grep piece of code written by Hiram 3 years ago, in the 
~/kent/src/hg/makeDb/doc/hg18.txt, which would help me a lot (I copied 
here, see below). But I have two questions to this:
    1). Do I have to run the pipeline in the cluster? Can I just run it 
on a server, in case I don't have assess to a cluster? If so, how should 
I set parameters for the pipeline?
    2). After running the doBlastzChainNet.pl, it seems you ssh to 
another machine ("ssh kolossus") and run the featureBits. What's the 
purpose for this? Do I have to include this part if I just want to make 
the selfChain data?
    3). If I want to make a chainSelf table in MySQL (like the table in 
ucsc), what additional script I should run for that?

4. Last question, I noticed that in your description page of self Chain, 
it says to use a specific matrix for the dynamic program which was run 
over the kd-trees to find the maximally scoring chains of these blocks. 
But the matrix is not given in the tetraodon selfChain page. Can I know 
how I should set the matrix for making zebrafish selfChain? Or, this 
does not matter at all, for the integrated doBlastzChainNet.pl 
pipeline?  The reason to ask this question is, since the selfChain is 
mainly for detecting the paralog part in the genome, and most of those 
are from duplication (whole-genome duplication like in teleost file, or 
tandem duplication locally), the matrix for scoring the chain should 
somehow measure the distance from the split point (e.g. when WGD 
happened) to now in different genome. I guess this should be different 
for zebrafish (where the WGD happened 300-450 Mya)  and human (where the 
2R WGD happened much earlier). But how UCSC set the substitution(?) 
matrix, I have no clue. Like to hear option from you.

Sorry for too many questions :) Thanks for any help

-Xianjun

============ sample code from ~/kent/src/hg/makeDb/doc/hg18.txt 
=================

#  BLASTZ SELF (DONE - 2006-01-17 - 2006-01-20 - Hiram)

    ssh pk
    mkdir /cluster/data/hg18/bed/blastzSelf.2006-01-17
    cd /cluster/data/hg18/bed/blastzSelf.2006-01-17

  # prepare the DEF file

  cd /cluster/data/hg18/bed/blastzSelf.2006-01-17
    time /cluster/bin/scripts/doBlastzChainNet.pl -verbose=2 \
        -chainMinScore=10000 -chainLinearGap=medium -bigClusterHub=pk \
        `pwd`/DEF > blastz.out 2>&1 &
    #   real    640m37.637s

    ssh kolossus
    cd /cluster/data/hg18/bed/blastzSelf.2006-01-17
    time HGDB_CONF=~/.hg.conf.read-only featureBits \
        -noRandom -noHap hg18 chainSelfLink > fb.chainSelfLink 2>&1 &
    #   real    21m52.697s
    #   324067552 bases of 2858034764 (11.339%) in intersection


Kayla Smith wrote:
> Hello Xianjun,
>
> Here is a wiki page on chains and nets:
> http://genomewiki.ucsc.edu/index.php/Chains_Nets
>
> You would need to download the kent source tree:
> http://genome.ucsc.edu/FAQ/FAQlicense.html#license3
>
> I hope this information is helpful to you.  
>
> Kayla Smith
> UCSC Genome Bioinformatics Group
>
> ----- "Xianjun" <[email protected]> wrote:
>
>   
>> Thanks for the good news
>>
>> Personally, I might have to use zebrafish self-chain data before July.
>>
>> Could you kindly guide me how to do that myself? I mean the zv8, if 
>> possible.
>>
>> Regards,
>>
>> Xianjun
>>
>> Donna Karolchik wrote:
>>     
>>> hi Xianjun,
>>>
>>> You'll be pleased to hear that we do indeed have the zv8 assembly
>>>       
>> back 
>>     
>>> on our active project list, and hope to have it available on the 
>>> public site by perhaps sometime in July, depending on other incoming
>>>       
>>> priorities and staffing levels. We will announce the release on our
>>>       
>>> [email protected] mailing list when the browser becomes 
>>> available.
>>>
>>> -Donna
>>> ---------------
>>> Donna Karolchik
>>> UCSC Genome Browser Project Manager
>>> http://genome.ucsc.edu
>>>
>>>
>>> Xianjun Dong wrote:
>>>       
>>>> Hi,
>>>>
>>>> OK. I am back with the same request now, after the 
>>>> assembly/annotation of Zv8 is done.
>>>>
>>>> We (and also the community, I think) know that, Zv8 was expected as
>>>>         
>> a 
>>     
>>>> big improvement for Zv7, and it IS indeed, from the analysis report
>>>>         
>>>> they released. So, we eagerly request UCSC, as one of the main hubs
>>>>         
>>>> of data/tool for bioinformatics, to
>>>> 1. update danRer6 (Zv8) new assembly on UCSC
>>>> 2. make hg18:danRer6 chain/net alignment
>>>> 3. put zebrafish self-chain alignment.
>>>>
>>>> Thanks
>>>>
>>>> Regards,
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> Jennifer Jackson wrote:
>>>>         
>>>>> Hello,
>>>>> One of our scientists has some specific ideas concerning the 
>>>>> zebrafish assembly as follows:
>>>>>
>>>>> They think the main reason the genome is so difficult to assemble
>>>>>           
>>>>> was due
>>>>> to the DNA collection strategy:-
>>>>>
>>>>>           
>> http://www.sanger.ac.uk/Projects/D_rerio/Zv3_assembly_information.shtml
>>     
>>>>> The FAQ indicates there should be a finished genome by the end of
>>>>>           
>>>>> this year:
>>>>> http://www.sanger.ac.uk/Projects/D_rerio/faqs.shtml#factsnine
>>>>>
>>>>> Maybe you could discuss your suggestion with the sequencing
>>>>>           
>> project,
>>     
>>>>> and if it would help them, we could discuss it further.
>>>>>
>>>>> Thank you for your offer to help improve the data,
>>>>> Jennifer Jackson
>>>>> UCSC Genome Bioinformatics Group
>>>>>
>>>>> Xianjun Dong wrote:
>>>>>           
>>>>>> To those who might concern,
>>>>>>
>>>>>> Zebrafish has been one of the most studied models in study of
>>>>>>             
>> whole 
>>     
>>>>>> genome duplication and development, but its genome assembly is
>>>>>>             
>> not 
>>     
>>>>>> so well (which is naturally difficult also due to the whole
>>>>>>             
>> genome 
>>     
>>>>>> duplication there). We also noticed much duplication closely
>>>>>>             
>> mapped 
>>     
>>>>>> in same chromosome, which actually are proved as assembly error
>>>>>>             
>> in 
>>     
>>>>>> zv7, by BLATing in the new assembly Zv8 
>>>>>> (http://pre.ensembl.org/Danio_rerio/Info/Index). Before Zv8 
>>>>>> annotation get done (which might help to some extent, but not
>>>>>>             
>> all), 
>>     
>>>>>> I am thinking if UCSC could make a self-chain for zebrafish, just
>>>>>>             
>>>>>> like you did for human. If that information offered, we could
>>>>>>             
>> write 
>>     
>>>>>> a script to quickly check those 'tandem' duplications close in 
>>>>>> genome, which can eventually help to improve the quality of the 
>>>>>> current assembly.
>>>>>>
>>>>>> If you think this might not be done in the coming soon by your 
>>>>>> plan, I will be appreciated if you can offer any assistance for
>>>>>>             
>> me 
>>     
>>>>>> to try it myself.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>   
>>>>>>             
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>     


-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: [email protected]
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to