Hello,

CD-HIT can remove redundancies from sequence files, the sequences do not need 
to be aligned.
http://weizhongli-lab.org/cdhit_suite/cgi-bin/index.cgi?cmd=cd-hit

Andreas

>-----Original Message-----
>From: jalview-discuss-boun...@jalview.org [mailto:jalview-discuss-
>boun...@jalview.org] On Behalf Of Jim Procter
>Sent: 19 October 2016 07:47
>To: jalview-discuss@jalview.org
>Subject: [Jalview-discuss] Redundancy removal for large sets of sequences [was
>Re: Problems installing and then running Jalview on Windows 10 ]
>
>Hi Kausik.
>
>
>On 18/10/2016 19:33, Kausik Datta wrote:
>> What I next needed to see is whether this pipeline can handle a FASTA file
>with similar AND dissimilar sequences in it. Unfortunately, Jalview tried to 
>align
>all sequences in it, introducing gaps in the middle (naturally) which threw off
>the redundancy removal process also. I was able to partially remedy this by
>creating groups of similar sequences, but then I had to do the alignment &
>redundancy for each group separately.
>This does sound like a limitation with the percent-identity measure used, since
>it calculates the degree of similarity including gapped columns (something that
>Jalview has done from the beginning). For 100% identity, however, it is 
>actually
>unlikely to matter, since for any reliable alignment algorithm, sequence
>fragments will be aligned in the same way as the full length sequences.
>
>Could you give us a little more background ? If it is purely about removing
>'identical' fragments, then the '100%' removal will work because a
>subsequence and its full length counterpart will be 100% identical regardless.
>
>> What I have come to realize is that there is probably no single program that
>can help me do what I am trying to achieve: remove redundancies from a large
>FASTA file.
>You may be correct here - Jalview's redundancy removal function was only
>designed for use in comparative analsis. There are some standard methods for
>performing this filtering, of course (it's a common step for any sequencing
>pipeline) - but again, it depends what you you are trying to achieve !
>
>Does anyone else have any suggestions to help Kausik ?
>Jim.
>
>
>
>--
>-------------------------------------------------------------------
>Dr JB Procter, Jalview Coordinator, The Barton Group Division of Computational
>Biology, School of Life Sciences University of Dundee, Dundee DD1 5EH, UK.
>+44 1382 388734 | www.jalview.org | www.compbio.dundee.ac.uk
>
>_______________________________________________
>Jalview-discuss mailing list
>Jalview-discuss@jalview.org
>http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss
_______________________________________________
Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

Reply via email to