I am happy to report that the Alignment -> Mafft with Defaults (prior to the
redundancy removal) works exactly as expected. Thank you. This is what I am
trying to achieve.
What I next needed to see is whether this pipeline can handle a FASTA file with
similar AND dissimilar sequences in it. Unfortunately, Jalview tried to align
all sequences in it, introducing gaps in the middle (naturally) which threw off
the redundancy removal process also. I was able to partially remedy this by
creating groups of similar sequences, but then I had to do the alignment &
redundancy for each group separately.
This would be impossible to do from a large FASTA file that I have, with
430,000+ sequences and about 250MB in size.
What I have come to realize is that there is probably no single program that
can help me do what I am trying to achieve: remove redundancies from a large
FASTA file. So I shall use Jalview differently, as a quick check for short
chunks of sequences.
Provided the memory issue is resolved. (My last email to you, titled "memory
Thank you for your most kind and patient help.
From: Jim Procter [mailto:foreverem...@gmail.com] On Behalf Of James Procter
Sent: Tuesday, October 18, 2016 1:09 PM
To: Kausik Datta <kdat...@jhmi.edu>
Subject: Re: [Jalview-discuss] Problems installing and then running Jalview on
Hi Kausik - I'm very glad that you've found a way that works ! (sleep is hard
for me when I know there are Jalview users out there with problems!).
I'll again take your questions in turn. As I guess you'd prefer, I've not cc'ed
the discussion list.
On 18/10/2016 17:41, Kausik Datta wrote:
>>gi|152212369|gb|ABS31340.1| beta-tubulin, partial [Aspergillus
> The last four sequences, under different accessions, are 100%
> identical peptides. The first one, a larger peptide, contains the
> entire sequence of the last four in it. I want to refine my FASTA file
> to eliminate the replicate sequences post hoc.
OK. Before I go into your questions, let me suggest this workflow:
1. Use one of the alignment services on your imported sequence set
(Webservice->Alignment->Mafft With Defaults should work fine).
2. Use Remove Redundancies as before. It should work as expected.
Now to explain why this works... your question:
> (a) The sequence alignment for all five sequences start at position
> 1. Which is why Jalview might be missing (at least the graphical
> representation) that all 5 of these proteins are identical. It thinks
> the first sequence is off by an amino acid (an ‘S’ stands out)
Jalview only does what you tell it - except for a few defaults. In this case,
you've given Jalview a set of unaligned sequences in a FASTA file, where for
each sequence, no start position was specified, so Jalview has assigned
position 1 to the first residue in each sequence, and shown them with out any
gaps, since none were present in the initial file.
> (b) I pressed Control+D to remove the redundancies. It removed
> sequences 3-5, leaving 1 and 2 – which, clearly, it considers separate
> sequences. QUESTION: Is it possible to have Jalview recognize that the
> smaller peptide is actually a part of the larger peptide?
See my suggested workflow. The Redundancy dialog computes percent-identity
between sequences based on the current alignment, rather than the unaligned
pair. We've got an outstanding enhancement about this (see
http://issues.jalview.org/browse/JAL-514), but it has not yet been implemented.
> (c) When I try to export the output to FASTA (attached temp2.fasta
> file), it seems to retain the trailing gap marks (----) which will
> likely cause issues if I try to use this FASTA file for any downstream
> search process. QUESTION: Is it possible to eliminate these trailing
> ‘-‘ character gap markers from the generated FASTA file?
It looks like you have 'Pad Gaps' enabled by default. If this is a one-off
procedure, then first select 'Edit->Pad Gaps' to untick it, and then select
'Edit->Remove all gaps' to remove all the '-' symbols. You should then be able
to export the file.
If you want to disable pad-gaps for all new alignments, you can disable 'Pad
Gaps' via the 'Editing' panel in your Jalview's Preferences
Hope this helps ! Let me know if you have any more questions.
PS. May I send a modified version of this email to Jalview-discuss for the
benefit of other people on the list ?
Jalview-discuss mailing list