Re: [Genome] batch extracting sequence by coordinates

Daofeng Li Tue, 09 Aug 2011 20:28:21 -0700

Hi Hiram,


but i didnot see any explanation of twoBitToFa supporting Bed format..where
is it?

Thanks:)

Best.


On Tue, Aug 9, 2011 at 9:08 PM, Hiram Clawson <[email protected]> wrote:

>
> You can always use twoBitToFa with its options of processing
> via a list file or via a bed file.
>
> --Hiram
>
> ----- Original Message -----
> From: "Daofeng Li" <[email protected]>
> To: "Ivan Adzhubey" <[email protected]>
> Cc: [email protected]
> Sent: Tuesday, August 9, 2011 2:11:55 PM
> Subject: Re: [Genome] batch extracting sequence by coordinates
>
> Thanks Ivan.
> actually i use the fastaFromBed utility finally, it runs very fast, i
> recommend for this tool:)
>
> Best.
>
> On Tue, Aug 9, 2011 at 2:39 PM, Ivan Adzhubey <
> [email protected]> wrote:
>
> > Hi Daofeng,
> >
> > I suggest using nibFrag for this purpose. I found it generally faster
> > compared
> > to twoBitToFa since for each extraction operation it will only read (a
> much
> > smaller size) per chromosome nib file instead of a huge 2bit whole genome
> > one.
> > Also nibFrag would reverse-complement extracted sequence automatically
> when
> > strand=m while twoBitToFa does not have such option.
> >
> > The only downside is that you will need to convert downloaded chromosome
> > .fa.gz files to nib format (UCSC does not provide chromosomes in nib
> format
> > for
> > download). But you only have to do this once.
> >
> > Best,
> > Ivan
> >
> > On Tuesday, August 09, 2011 03:27:09 PM Daofeng Li wrote:
> > > Hi list members,
> > >
> > > Is there an effective way for extracting sequence from human genome
> hg19
> > by
> > > coordinates?
> > > i have millions of start-end positions, might this huge amount of data
> > not
> > > suite for Table browser.
> > > I was think use the .2bit genome, any suggestions?
> > > i am also thing using following steps:
> > >
> > > **
> > >
> > > * *
> > >
> > > *twoBitToFa*
> > > *
> > >
> > > twoBitToFa - Convert all or part of .2bit file to fasta
> > >
> > > usage:
> > >
> > >    twoBitToFa input.2bit output.fa
> > >
> > > options:
> > >
> > >    -seq=name - restrict this to just one sequence
> > >
> > >    -start=X  - start at given position in sequence (zero-based)
> > >
> > >    -end=X - end at given position in sequence (non-inclusive)
> > >
> > >
> > >
> > > faToNib
> > >
> > > faToNib - Convert from .fa to .nib format
> > >
> > > usage:
> > >
> > >    faToNib in.fa out.nib
> > >
> > >
> > >
> > > nibFrag
> > >
> > > nibFrag - Extract part of a nib file as .fa
> > >
> > > usage:
> > >
> > >    nibFrag file.nib start end strand out.fa
> > >
> > > Is this would be the fast way?
> > >
> > > Thanks in advance.
> > >
> > > Best.
>



-- 
Daofeng Li
Postdoc Research Associate
Department of Genetics
Washington University in St.Louis School of Medicine
314-556-2832
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] batch extracting sequence by coordinates

Reply via email to