Dear Brooke,

Thank you for sending me the fasta-subseq executable.

I tested the program. For future reference, it indeed assumes 1-based inclusive 
coordinates (so "fasta-subseq filename 1 10" returns the first ten bases in the 
sequence). If strand is '-', then it first takes the reverse complement of the 
sequence, and then finds the start and end of the subsequence. So "fasta-subseq 
filename 1 10 -" is in general not the reverse complement of "fasta-subseq 
filename 1 10".

The headers are modified as follows. Without reverse complementing, the new 
header is 

>filename:start-end:___

so with three underscores at the end. With reverse complementing, the new 
header is

>filename:start-end:__-

so with two underscores and then a single dash.



Thanks,
-Michiel.


--- On Fri, 3/16/12, Brooke Rhead <[email protected]> wrote:

> From: Brooke Rhead <[email protected]>
> Subject: Re: [Genome] fasta-subseq source code
> To: "Michiel de Hoon" <[email protected]>
> Cc: [email protected]
> Date: Friday, March 16, 2012, 4:01 PM
> Hi Michiel,
> 
> We do not have the source code, either (and we are in the
> process of 
> changing our programs to use faFrag instead of fasta-subseq
> to avoid 
> problems should the binary be lost in the future), but the
> usage 
> statement indicates that it uses 1-based coordinates:
> 
> $ ./fasta-subseq -help
> usage: seqfile lo hi [strand] (1 indexed)
> 
> If you like, we can send you our binary so that you can test
> what it 
> does with the Fasta headers.
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> 
> On 3/15/12 8:26 PM, Michiel de Hoon wrote:
> > Dear all,
> >
> > I am looking for the source code (or a binary) of the
> fasta-subseq
> > program that is used in blastz-run-ucsc to abridge
> repeat regions.
> > This previous message on the mailing list:
> >
> > https://lists.soe.ucsc.edu/pipermail/genome/2006-June/010902.html
> >
> > says that this program was compiled from PSU source
> code. However, I
> > couldn't find this program or its source code there.
> Does anybody
> > know where to find this program? If not, is its usage
> described
> > somewhere in detail? In particular I am wondering if
> fasta-subseq
> > uses 1-based coordinates or 0-based coordinates, and if
> it modifies
> > the header lines in the Fasta file in some way.
> >
> > Thanks, Michiel de Hoon RIKEN Omics Science Center
> > _______________________________________________ Genome
> maillist  -
> > [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to