Thank you for following up with the results, Michiel! -- Brooke Rhead UCSC Genome Bioinformatics Group
On 3/19/12 5:43 AM, Michiel de Hoon wrote: > Dear Brooke, > > Thank you for sending me the fasta-subseq executable. > > I tested the program. For future reference, it indeed assumes 1-based > inclusive coordinates (so "fasta-subseq filename 1 10" returns the > first ten bases in the sequence). If strand is '-', then it first > takes the reverse complement of the sequence, and then finds the > start and end of the subsequence. So "fasta-subseq filename 1 10 -" > is in general not the reverse complement of "fasta-subseq filename 1 > 10". > > The headers are modified as follows. Without reverse complementing, > the new header is > >> filename:start-end:___ > > so with three underscores at the end. With reverse complementing, the > new header is > >> filename:start-end:__- > > so with two underscores and then a single dash. > > > > Thanks, -Michiel. > > > --- On Fri, 3/16/12, Brooke Rhead<[email protected]> wrote: > >> From: Brooke Rhead<[email protected]> Subject: Re: [Genome] >> fasta-subseq source code To: "Michiel de >> Hoon"<[email protected]> Cc: [email protected] Date: Friday, >> March 16, 2012, 4:01 PM Hi Michiel, >> >> We do not have the source code, either (and we are in the process >> of changing our programs to use faFrag instead of fasta-subseq to >> avoid problems should the binary be lost in the future), but the >> usage statement indicates that it uses 1-based coordinates: >> >> $ ./fasta-subseq -help usage: seqfile lo hi [strand] (1 indexed) >> >> If you like, we can send you our binary so that you can test what >> it does with the Fasta headers. >> >> -- Brooke Rhead UCSC Genome Bioinformatics Group >> >> >> On 3/15/12 8:26 PM, Michiel de Hoon wrote: >>> Dear all, >>> >>> I am looking for the source code (or a binary) of the >> fasta-subseq >>> program that is used in blastz-run-ucsc to abridge >> repeat regions. >>> This previous message on the mailing list: >>> >>> https://lists.soe.ucsc.edu/pipermail/genome/2006-June/010902.html >>> >>> >>> says that this program was compiled from PSU source >> code. However, I >>> couldn't find this program or its source code there. >> Does anybody >>> know where to find this program? If not, is its usage >> described >>> somewhere in detail? In particular I am wondering if >> fasta-subseq >>> uses 1-based coordinates or 0-based coordinates, and if >> it modifies >>> the header lines in the Fasta file in some way. >>> >>> Thanks, Michiel de Hoon RIKEN Omics Science Center >>> _______________________________________________ Genome >> maillist - >>> [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
