(Cc. to r-sig-genetics)

I'm going to modify the algorithm in haplotype.DNAbin() as follows:

1. Find the sequences that are exactly identical, so that, eg, the 3 sequences:

A-
AR
AA

would be treated as different at this step.

2. Substitute the leading and trailing "-" for N (thus keeping the alignment 
gaps only in the 'middle' of sequences).

3. Compute the Hamming distances among haplotypes using 5 states (A, G, C, T, 
and "-") and ambiguities so that, eg, d(A,R)=0, d(G,R)=0, d(A,G)=1, and so on.

4. If all these distances > 0 then exit.

5. Examine each haplotype and its distances to the others:

5a. If there is only one distance = 0, then pool them in a single haplotype and 
give a warning.

5b. If two or more distances are equal to zero, then keep them separate and 
give a message (possibly attached to the returned object).


There could be options to control this algorithm:
- exit after step 1.
- ignore step 2.

At step 5, it seems to make sense to start with the "shortest" sequences and 
pool them with the "longer" ones, ie, "A-" would be pooled with "AA".

Comments and suggestions are welcome.

Best,

Emmanuel

----- Le 26 Fév 20, à 16:35, Emmanuel Paradis emmanuel.para...@ird.fr a écrit :

> Hi Hirra,
> 
> The assignment is not random, it follows the order of the sequences in the 
> data:
> 
> - Seqs. A and B are compared and found to be identical so they are both 
> assigned
> to haplotype I.
> - Seq. C is compared to haplotype I (effectively seq. A) and found to be
> different so it is assigned to haplotype II.
> - Seq. D is compared to haplotype I and found to be similar and so assigned to
> haplotype I.
> 
> If you reorder your data and put Seq. C first, you'd obtain that C and D are
> assigned to the same haplotype. The same issue occurs with ambiguous bases.
> 
> These situations certainly deserve to have an option to haplotype() to handle
> them properly.
> 
> Best,
> 
> Emmanuel
> 
> ----- Le 25 Fév 20, à 19:31, Hirra Farooq 
> hirra.far...@postgrad.manchester.ac.uk
> a écrit :
> 
>> Hello,
>> 
>> I am using the pegas R package to assign sequences into haplotypes.
>> 
>> I recently tried out a test examples with 4 sequences. 2 of the sequences (A 
>> and
>> B) are identical, 1 sequence (Seq C) differs from these at only one position
>> (pos 648).
>> The 4th sequence (Seq D) is identical to all but shorter so has no residues 
>> at
>> the determinant position 648. (See image below)
>> 
>> So correctly pegas assigns A and B to haplotype I and C to haplotype II. 
>> However
>> it also assigns D to I, despite there being no information at which residue 
>> is
>> at the determinant position.
>> 
>> I just wanted to know in such cases as D when there is missing information, 
>> does
>> pegas just randomly assign to a haplotype?
>> 
>> 
>> 
>> 
>>    aln (633..663)                  names
>> [A] CCCGATTTTATATCAACATTTATTT------
>> [D] CCCGATTTT----------------------
>> [B] CCCGATTTTATATCAACATTTATTT------
>> [C] CCCGATTTTATATCACCATTTATTTTGATTT
>> 
>> 
>> Thanks and best wishes,
>> Hirra
>> University of Manchester Student.
>> 
>>      [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to