I would say partially an oversight on my part & partially done on purpose (a 
gap is not a nucleotide after all). However I'm all in favour of being 
pragmatic here so lets add them in. If I get an okay from the relevant parties 
I'll commit the change in.

Andy

On 6 Dec 2010, at 18:41, Chris Friedline wrote:

> OK, so here's a quick fix now that I know where to look.  In my local
> source I added the following line to the constructor of DNACompoundSet
> and recompiled.
> 
> addNucleotideCompound("-", "-");
> 
> Not sure if this is the correct place for it in terms of what the devs
> want to do globally, but it gets me moving forward again.  Gap
> characters are in AminoAcidCompoundSet so I'm wondering if this was
> just a tiny oversight on the nucleotide front.
> 
> Thanks again for the help everyone,
> Chris
> 
> On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <[email protected]> wrote:
>> That does help, thanks.  However, when calling getAsList() on the
>> aligned sequences and printing, this is what I see.  Something seems
>> wrong.  It does appear as though null is being inserted where there
>> should be gaps
>> 
>> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
>> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
>> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
>> null, null, null, null, null, null]
>> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
>> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
>> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
>> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
>> 
>> Chris
>> 
>> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <[email protected]> wrote:
>>> Hi Andy,
>>> 
>>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
>>> Does that help?
>>> 
>>> Andreas
>>> 
>>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <[email protected]> wrote:
>>>> So myself & Chris have discussed this off list & we believe it's because 
>>>> of a NULL compound element in the Sequence given to the SequenceMixin 
>>>> method.
>>>> 
>>>> Does anyone on list know how the AlignedSequence code encodes gaps & the 
>>>> alike?
>>>> 
>>>> Andy
>>>> 
>>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
>>>> 
>>>>> Hi Chris,
>>>>> 
>>>>> Well that's going into my toStringBuilder() method & that particular line 
>>>>> is concerned with asking a compound for its String representation. How 
>>>>> often do we get nulls in our Sequences and how to deal with them. After 
>>>>> all the Sequence AGTCNULLAGTC is probably more harmful then helpful
>>>>> 
>>>>> Andy
>>>>> 
>>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Found another potential error case, this time in beta2 (fresh pull
>>>>>> from git last evening).  For more info, please see
>>>>>> http://pastie.org/1351388 for test case and stack trace.  The JUnit
>>>>>> test passes simply because the pair object is not null, but fails when
>>>>>> trying to extract any information from the pair itself (toString(),
>>>>>> getIdenticals(), etc). The substitution matrix file is from
>>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices.  I'm doing large numbers of
>>>>>> pairwise alignments, which do not all fail, but most do with this same
>>>>>> error.
>>>>>> 
>>>>>> Thanks,
>>>>>> Chris
>>>>>> 
>>>>>> --
>>>>>> PhD Candidate, Integrative Life Sciences
>>>>>> Virginia Commonwealth University
>>>>>> Richmond, VA
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Biojava-l mailing list  -  [email protected]
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  [email protected]
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> 
>>>> --
>>>> Andrew Yates                   Ensembl Genomes Engineer
>>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  [email protected]
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  [email protected]
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>> 
>> 
>> 
>> --
>> PhD Candidate, Integrative Life Sciences
>> Virginia Commonwealth University
>> Richmond, VA
>> 
> 
> 
> 
> -- 
> PhD Candidate, Integrative Life Sciences
> Virginia Commonwealth University
> Richmond, VA

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/





_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to