Hi Kausik.

On 18/10/2016 19:33, Kausik Datta wrote:
> What I next needed to see is whether this pipeline can handle a FASTA file 
> with similar AND dissimilar sequences in it. Unfortunately, Jalview tried to 
> align all sequences in it, introducing gaps in the middle (naturally) which 
> threw off the redundancy removal process also. I was able to partially remedy 
> this by creating groups of similar sequences, but then I had to do the 
> alignment & redundancy for each group separately. 
This does sound like a limitation with the percent-identity measure
used, since it calculates the degree of similarity including gapped
columns (something that Jalview has done from the beginning). For 100%
identity, however, it is actually unlikely to matter, since for any
reliable alignment algorithm, sequence fragments will be aligned in the
same way as the full length sequences.

Could you give us a little more background ? If it is purely about
removing 'identical' fragments, then the '100%' removal will work
because a subsequence and its full length counterpart will be 100%
identical regardless.

> What I have come to realize is that there is probably no single program that 
> can help me do what I am trying to achieve: remove redundancies from a large 
> FASTA file.
You may be correct here - Jalview's redundancy removal function was only
designed for use in comparative analsis. There are some standard methods
for performing this filtering, of course (it's a common step for any
sequencing pipeline) - but again, it depends what you you are trying to
achieve !

Does anyone else have any suggestions to help Kausik ?

Dr JB Procter, Jalview Coordinator, The Barton Group
Division of Computational Biology, School of Life Sciences
University of Dundee, Dundee DD1 5EH, UK.
+44 1382 388734 | www.jalview.org | www.compbio.dundee.ac.uk

Jalview-discuss mailing list

Reply via email to