I would say partially an oversight on my part & partially done on purpose (a
gap is not a nucleotide after all). However I'm all in favour of being
pragmatic here so lets add them in. If I get an okay from the relevant parties
I'll commit the change in.
Andy
On 6 Dec 2010, at 18:41, Chris Friedline wrote:
> OK, so here's a quick fix now that I know where to look. In my local
> source I added the following line to the constructor of DNACompoundSet
> and recompiled.
>
> addNucleotideCompound("-", "-");
>
> Not sure if this is the correct place for it in terms of what the devs
> want to do globally, but it gets me moving forward again. Gap
> characters are in AminoAcidCompoundSet so I'm wondering if this was
> just a tiny oversight on the nucleotide front.
>
> Thanks again for the help everyone,
> Chris
>
> On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <[email protected]> wrote:
>> That does help, thanks. However, when calling getAsList() on the
>> aligned sequences and printing, this is what I see. Something seems
>> wrong. It does appear as though null is being inserted where there
>> should be gaps
>>
>> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
>> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
>> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
>> null, null, null, null, null, null]
>> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
>> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
>> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
>> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
>>
>> Chris
>>
>> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <[email protected]> wrote:
>>> Hi Andy,
>>>
>>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
>>> Does that help?
>>>
>>> Andreas
>>>
>>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <[email protected]> wrote:
>>>> So myself & Chris have discussed this off list & we believe it's because
>>>> of a NULL compound element in the Sequence given to the SequenceMixin
>>>> method.
>>>>
>>>> Does anyone on list know how the AlignedSequence code encodes gaps & the
>>>> alike?
>>>>
>>>> Andy
>>>>
>>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
>>>>
>>>>> Hi Chris,
>>>>>
>>>>> Well that's going into my toStringBuilder() method & that particular line
>>>>> is concerned with asking a compound for its String representation. How
>>>>> often do we get nulls in our Sequences and how to deal with them. After
>>>>> all the Sequence AGTCNULLAGTC is probably more harmful then helpful
>>>>>
>>>>> Andy
>>>>>
>>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Found another potential error case, this time in beta2 (fresh pull
>>>>>> from git last evening). For more info, please see
>>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit
>>>>>> test passes simply because the pair object is not null, but fails when
>>>>>> trying to extract any information from the pair itself (toString(),
>>>>>> getIdenticals(), etc). The substitution matrix file is from
>>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of
>>>>>> pairwise alignments, which do not all fail, but most do with this same
>>>>>> error.
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>>> --
>>>>>> PhD Candidate, Integrative Life Sciences
>>>>>> Virginia Commonwealth University
>>>>>> Richmond, VA
>>>>>>
>>>>>> _______________________________________________
>>>>>> Biojava-l mailing list - [email protected]
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - [email protected]
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>> --
>>>> Andrew Yates Ensembl Genomes Engineer
>>>> EMBL-EBI Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list - [email protected]
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - [email protected]
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>> --
>> PhD Candidate, Integrative Life Sciences
>> Virginia Commonwealth University
>> Richmond, VA
>>
>
>
>
> --
> PhD Candidate, Integrative Life Sciences
> Virginia Commonwealth University
> Richmond, VA
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l