(Cc. to r-sig-genetics) I'm going to modify the algorithm in haplotype.DNAbin() as follows:
1. Find the sequences that are exactly identical, so that, eg, the 3 sequences: A- AR AA would be treated as different at this step. 2. Substitute the leading and trailing "-" for N (thus keeping the alignment gaps only in the 'middle' of sequences). 3. Compute the Hamming distances among haplotypes using 5 states (A, G, C, T, and "-") and ambiguities so that, eg, d(A,R)=0, d(G,R)=0, d(A,G)=1, and so on. 4. If all these distances > 0 then exit. 5. Examine each haplotype and its distances to the others: 5a. If there is only one distance = 0, then pool them in a single haplotype and give a warning. 5b. If two or more distances are equal to zero, then keep them separate and give a message (possibly attached to the returned object). There could be options to control this algorithm: - exit after step 1. - ignore step 2. At step 5, it seems to make sense to start with the "shortest" sequences and pool them with the "longer" ones, ie, "A-" would be pooled with "AA". Comments and suggestions are welcome. Best, Emmanuel ----- Le 26 Fév 20, à 16:35, Emmanuel Paradis emmanuel.para...@ird.fr a écrit : > Hi Hirra, > > The assignment is not random, it follows the order of the sequences in the > data: > > - Seqs. A and B are compared and found to be identical so they are both > assigned > to haplotype I. > - Seq. C is compared to haplotype I (effectively seq. A) and found to be > different so it is assigned to haplotype II. > - Seq. D is compared to haplotype I and found to be similar and so assigned to > haplotype I. > > If you reorder your data and put Seq. C first, you'd obtain that C and D are > assigned to the same haplotype. The same issue occurs with ambiguous bases. > > These situations certainly deserve to have an option to haplotype() to handle > them properly. > > Best, > > Emmanuel > > ----- Le 25 Fév 20, à 19:31, Hirra Farooq > hirra.far...@postgrad.manchester.ac.uk > a écrit : > >> Hello, >> >> I am using the pegas R package to assign sequences into haplotypes. >> >> I recently tried out a test examples with 4 sequences. 2 of the sequences (A >> and >> B) are identical, 1 sequence (Seq C) differs from these at only one position >> (pos 648). >> The 4th sequence (Seq D) is identical to all but shorter so has no residues >> at >> the determinant position 648. (See image below) >> >> So correctly pegas assigns A and B to haplotype I and C to haplotype II. >> However >> it also assigns D to I, despite there being no information at which residue >> is >> at the determinant position. >> >> I just wanted to know in such cases as D when there is missing information, >> does >> pegas just randomly assign to a haplotype? >> >> >> >> >> aln (633..663) names >> [A] CCCGATTTTATATCAACATTTATTT------ >> [D] CCCGATTTT---------------------- >> [B] CCCGATTTTATATCAACATTTATTT------ >> [C] CCCGATTTTATATCACCATTTATTTTGATTT >> >> >> Thanks and best wishes, >> Hirra >> University of Manchester Student. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/