Don Naki wrote:

Hi all,
I have a couple of 'novice' questions...

I can't seem to figure out how to create a SimpleGappedSymbolList from a String. I want to parse "-AQSD--VP-" and create a SimpleGappedSymbolList from it.
ProteinTools has methods to return a SymbolList, Sequence, and GappedSequence from a String, but not a GappedSymbolList. I understand GappedSequence extends GappedSymbolList, but I want just the GappedSymbolList. Alternatively, is there a way to get a GappedSymbolList from a GappedSequence?


We could add a uitlity method to do this. Why do you /have/ to have a GappedSymbolList that is not a GappedSequence? Is there a specific memory constraint?

A second question is that ProteinTools.createGappedProteinSequence("-AQSD--VP-").seqString() results in the String "XAQSD--VPX". The first and last '-' characters are now represented by 'X'. Is this a special kind of gap symbol? If so, how can I distinguish between '-' and 'X' Symbols?


This is a tokenization bug - the leading/trailing gaps are not being recognised by the tokenizer, and then replaced by X. It's probably in CharacterTokenization - needs a special-case for AlphabetManager.getGapSymbol() - could someone look a this?

Thanks in advance,
Don


Matthew
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to