I am happy to report that the Alignment -> Mafft with Defaults (prior to the 
redundancy removal) works exactly as expected. Thank you. This is what I am 
trying to achieve. 

What I next needed to see is whether this pipeline can handle a FASTA file with 
similar AND dissimilar sequences in it. Unfortunately, Jalview tried to align 
all sequences in it, introducing gaps in the middle (naturally) which threw off 
the redundancy removal process also. I was able to partially remedy this by 
creating groups of similar sequences, but then I had to do the alignment & 
redundancy for each group separately. 

This would be impossible to do from a large FASTA file that I have, with 
430,000+ sequences and about 250MB in size.

What I have come to realize is that there is probably no single program that 
can help me do what I am trying to achieve: remove redundancies from a large 
FASTA file. So I shall use Jalview differently, as a quick check for short 
chunks of sequences.

Provided the memory issue is resolved. (My last email to you, titled "memory 

Thank you for your most kind and patient help.

Best regards,

-----Original Message-----
From: Jim Procter [mailto:foreverem...@gmail.com] On Behalf Of James Procter
Sent: Tuesday, October 18, 2016 1:09 PM
To: Kausik Datta <kdat...@jhmi.edu>
Subject: Re: [Jalview-discuss] Problems installing and then running Jalview on 
Windows 10

Hi Kausik - I'm very glad that you've found a way that works ! (sleep is hard 
for me when I know there are Jalview users out there with problems!).

I'll again take your questions in turn. As I guess you'd prefer, I've not cc'ed 
the discussion list.

On 18/10/2016 17:41, Kausik Datta wrote:
>>gi|152212369|gb|ABS31340.1| beta-tubulin, partial [Aspergillus 

> The last four sequences, under different accessions, are 100% 
> identical peptides. The first one, a larger peptide, contains the 
> entire sequence of the last four in it. I want to refine my FASTA file 
> to eliminate the replicate sequences post hoc.

OK. Before I go into your questions, let me suggest this workflow:
1. Use one of the alignment services on your imported sequence set 
(Webservice->Alignment->Mafft With Defaults should work fine).
2. Use Remove Redundancies as before. It should work as expected.

Now to explain why this works... your question:
> (a)    The sequence alignment for all five sequences start at position
> 1. Which is why Jalview might be missing (at least the graphical
> representation) that all 5 of these proteins are identical. It thinks 
> the first sequence is off by an amino acid (an ‘S’ stands out)

Jalview only does what you tell it - except for a few defaults. In this case, 
you've given Jalview a set of unaligned sequences in a FASTA file, where for 
each sequence, no start position was specified, so Jalview has assigned 
position 1 to the first residue in each sequence, and shown them with out any 
gaps, since none were present in the initial file.

> (b)   I pressed Control+D to remove the redundancies. It removed
> sequences 3-5, leaving 1 and 2 – which, clearly, it considers separate 
> sequences. QUESTION: Is it possible to have Jalview recognize that the 
> smaller peptide is actually a part of the larger peptide?
See my suggested workflow. The Redundancy dialog computes percent-identity 
between sequences based on the current alignment, rather than the unaligned 
pair. We've got an outstanding enhancement about this (see 
http://issues.jalview.org/browse/JAL-514), but it has not yet been implemented.

> (c)    When I try to export the output to FASTA (attached temp2.fasta
> file), it seems to retain the trailing gap marks (----) which will 
> likely cause issues if I try to use this FASTA file for any downstream 
> search process. QUESTION: Is it possible to eliminate these trailing 
> ‘-‘ character gap markers from the generated FASTA file?
It looks like you have 'Pad Gaps' enabled by default. If this is a one-off 
procedure, then first select 'Edit->Pad Gaps' to untick it, and then select 
'Edit->Remove all gaps' to remove all the '-' symbols.  You should then be able 
to export the file.

If you want to disable pad-gaps for all new alignments, you can disable 'Pad 
Gaps' via the 'Editing' panel in your Jalview's Preferences 

Hope this helps !  Let me know if you have any more questions.

PS. May I send a modified version of this email to Jalview-discuss for the 
benefit of other people on the list ?
Jalview-discuss mailing list

Reply via email to