Re: [Genome] batch extracting sequence by coordinates

Ivan Adzhubey Tue, 09 Aug 2011 12:40:32 -0700

Hi Daofeng,

I suggest using nibFrag for this purpose. I found it generally faster compared 
to twoBitToFa since for each extraction operation it will only read (a much 
smaller size) per chromosome nib file instead of a huge 2bit whole genome one. 
Also nibFrag would reverse-complement extracted sequence automatically when 
strand=m while twoBitToFa does not have such option.


The only downside is that you will need to convert downloaded chromosome 
.fa.gz files to nib format (UCSC does not provide chromosomes in nib format for 
download). But you only have to do this once.

Best,
Ivan

On Tuesday, August 09, 2011 03:27:09 PM Daofeng Li wrote:
> Hi list members,
> 
> Is there an effective way for extracting sequence from human genome hg19 by
> coordinates?
> i have millions of start-end positions, might this huge amount of data not
> suite for Table browser.
> I was think use the .2bit genome, any suggestions?
> i am also thing using following steps:
> 
> **
> 
> * *
> 
> *twoBitToFa*
> *
> 
> twoBitToFa - Convert all or part of .2bit file to fasta
> 
> usage:
> 
>    twoBitToFa input.2bit output.fa
> 
> options:
> 
>    -seq=name - restrict this to just one sequence
> 
>    -start=X  - start at given position in sequence (zero-based)
> 
>    -end=X - end at given position in sequence (non-inclusive)
> 
> 
> 
> faToNib
> 
> faToNib - Convert from .fa to .nib format
> 
> usage:
> 
>    faToNib in.fa out.nib
> 
> 
> 
> nibFrag
> 
> nibFrag - Extract part of a nib file as .fa
> 
> usage:
> 
>    nibFrag file.nib start end strand out.fa
> 
> Is this would be the fast way?
> 
> Thanks in advance.
> 
> Best.
> *
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] batch extracting sequence by coordinates

Reply via email to