10 points! We do exactly that kind of thing on the Sequences page of the 
Protocols section. After you get all the sequences you like (those of a certain 
length, those that are unique, whatever), you can use the column choosing tool 
to get only the ID, Desc, sequence again, and then use change_tab_to_fasta to 
get back a FASTA file with just the sequences of interest. A piece of  cake for 
a bioinformaticist, but literally impossible for a non-programmer without this 
or a similar tool. The coolest part was watching biologists start thinking a 
bit more like bioinformaticists once they realized the possibilities. My goal 
was to give non-programmers these tools, so that we coders would be free to 
work on more interesting, hard stuff. (I never quite got to the "Profit!" step.)

_Amir
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of Martin Gollery [[email protected]]

One nice thing about this approach is that you could then sort them by
length, which might be very handy. You could find things like export all the
sequences of length >x but <y, for example.

Martin Gollery

On Fri, May 7, 2010 at 6:36 AM, Karger, Amir <[email protected]>wrote:

> Check out the Scriptome (yes, this is an advertisement.) at
> http://sysbio.harvard.edu/csb/resources/computational/scriptome/ , which
> is a set of Perl one-liners you cut and paste onto your command line to do
> bio-y text-y thigns.
>
> Use the change_fasta_to_tab tool to change your fasta to a tab-delimited
> file with ID, description, sequence. Then use the calc_col_length tool on
> the result, which will add another column giving the length of the sequence
> column. You can throw that into excel and hide the sequence column (or use
> choose_cols_to_delete to make a file without the seqeuences themselves) and
> then read through it at your leisure.

_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb

Reply via email to